2023年6月1日 18:11:15go评论90阅读模式

英文:

How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?

问题

我的问题如下：我可以制作一个图，其中数据相对于整个人群进行加权，但不能相对于它们自己的子人群进行加权。举个例子：

假设我有一个数据集DS，有两列：X和type。
X是一个连续值，范围从-5到5，type可以是A、B或C。

我如何创建一个X的频率图，其中每个元组都按其类型的总数进行加权，而不是数据集中所有元组的总数？

这是我最接近的尝试，但它是按照总体人口来加权的：

figure1 <- ggplot(data = DS, aes(x = X)) + geom_freqpoly(aes(colour = type, y = after_stat(count / sum(count)))) + ...

这不足为奇，因为它对整个数据集进行了标准化，但我不知道如何使它只对子集进行标准化。

使用dput()，我生成了以下示例数据框：

DS <- structure(list(X = c(0, -0.01, 0.042944432215413, 0.0431301011419889, 0.042944432215413, 0.0424042102083902, 0.2100000012 , 0.13513333335333), TimePoint = c("early", "early", "late", "mid", "mid", "early", "late", "early")), row.names = c(NA,8L), class = "data.frame")

其中，'X'是连续值，'TimePoint'是因子，可以是'early'、'mid'或'late'。

英文:

My problem is the following: I can make a figure in which the data is weighed relative to the entire population, but not relative to their own subpopulation. To illustrate with an example:

Suppose I have a dataset DS, with two columns: X and type.
X is a continues value ranging from -5 to 5, and type is either A, B or C.

How would I create a frequency plot of X in which each tuple is weighed by the total of its type, not the total of all tuples in the dataset?

This is my closest attempt, yet it weighs to the total population:
figure1 <- ggplot(data = DS, aes(x = X))+ geom_freqpoly(aes(colour = type, y= after_stat(count / sum(count)))) + ...
Its not surprising that this normalizes to the entire dataset, but I wouldnt know how to get it such that it only normalizes to a subset.

Using dput(), I generate the following example dataframe:

DS &lt;- structure(list(X = c(0, -0.01, 0.042944432215413, 0.0431301011419889, 0.042944432215413, 0.0424042102083902, 0.2100000012 , 0.13513333335333), TimePoint = c(&quot;early&quot;, &quot;early&quot;, &quot;late&quot;, &quot;mid&quot;, &quot;mid&quot;, &quot;early&quot;, &quot;late&quot;, &quot;early&quot;)), row.names = c(NA,8L), class = &quot;data.frame&quot;)

In which 'X' is the continous value and 'TimePoint' is the factor which can be either 'early', 'mid' or late.

答案1

得分: 0

一种选择是使用例如 ave() 函数来计算每个分组或 Timepoint 的 count：

英文:

One option would be to use e.g. ave() to compute the count per group or Timepoint:

library(ggplot2)
ggplot(data = DS, aes(x = X)) +
  geom_freqpoly(
    aes(
      colour = TimePoint,
      y = after_stat(count / ave(count, group, FUN = sum))
    )
  )
#&gt; `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?

问题

答案1

ifelse()函数中语句的顺序在R中重要吗？

为什么 `str_match` 不像 `regex101` 一样捕获分组？

如何在R中使用sf包获取矩阵中的个体ID，而不是行名称？

使用 rlang 和 tidyeval 在函数内部的当前方式

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。