How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?

huangapple go评论61阅读模式
英文:

How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?

问题

我的问题如下:我可以制作一个图,其中数据相对于整个人群进行加权,但不能相对于它们自己的子人群进行加权。举个例子:

假设我有一个数据集DS,有两列:Xtype
X是一个连续值,范围从-5到5,type可以是A、B或C。

我如何创建一个X的频率图,其中每个元组都按其类型的总数进行加权,而不是数据集中所有元组的总数?

这是我最接近的尝试,但它是按照总体人口来加权的:

figure1 <- ggplot(data = DS, aes(x = X)) + geom_freqpoly(aes(colour = type, y = after_stat(count / sum(count)))) + ...

这不足为奇,因为它对整个数据集进行了标准化,但我不知道如何使它只对子集进行标准化。

使用dput(),我生成了以下示例数据框:

DS <- structure(list(X = c(0, -0.01, 0.042944432215413, 0.0431301011419889, 0.042944432215413, 0.0424042102083902, 0.2100000012 , 0.13513333335333), TimePoint = c("early", "early", "late", "mid", "mid", "early", "late", "early")), row.names = c(NA,8L), class = "data.frame")

其中,'X'是连续值,'TimePoint'是因子,可以是'early'、'mid'或'late'。

英文:

My problem is the following: I can make a figure in which the data is weighed relative to the entire population, but not relative to their own subpopulation. To illustrate with an example:

Suppose I have a dataset DS, with two columns: X and type.
X is a continues value ranging from -5 to 5, and type is either A, B or C.

How would I create a frequency plot of X in which each tuple is weighed by the total of its type, not the total of all tuples in the dataset?

This is my closest attempt, yet it weighs to the total population:
figure1 &lt;- ggplot(data = DS, aes(x = X))+ geom_freqpoly(aes(colour = type, y= after_stat(count / sum(count)))) + ...
Its not surprising that this normalizes to the entire dataset, but I wouldnt know how to get it such that it only normalizes to a subset.

Using dput(), I generate the following example dataframe:

DS &lt;- structure(list(X = c(0, -0.01, 0.042944432215413, 0.0431301011419889, 0.042944432215413, 0.0424042102083902, 0.2100000012 , 0.13513333335333), TimePoint = c(&quot;early&quot;, &quot;early&quot;, &quot;late&quot;, &quot;mid&quot;, &quot;mid&quot;, &quot;early&quot;, &quot;late&quot;, &quot;early&quot;)), row.names = c(NA,8L), class = &quot;data.frame&quot;)

In which 'X' is the continous value and 'TimePoint' is the factor which can be either 'early', 'mid' or late.

答案1

得分: 0

一种选择是使用例如 ave() 函数来计算每个分组或 Timepointcount

英文:

One option would be to use e.g. ave() to compute the count per group or Timepoint:

library(ggplot2)

ggplot(data = DS, aes(x = X)) +
  geom_freqpoly(
    aes(
      colour = TimePoint,
      y = after_stat(count / ave(count, group, FUN = sum))
    )
  )
#&gt; `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?<!-- -->

huangapple
  • 本文由 发表于 2023年6月1日 18:11:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380844.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定