如何在R中根据另一个变量对数据进行条件汇总?

huangapple go评论99阅读模式
英文:

How to summarise data conditional on another variable in R?

问题

我想通过在另一列的值上进行条件计算,来总结数据,计算平均值。这里是一个例子:

  1. dat <- data.frame(group = c("A", "A", "A", "A", "B", "B", "B", "B"),
  2. xy = c(1:4, 1:4),
  3. val = 1:8)

期望的输出是:

  1. group var val
  2. 1 A mean1_2 1.5
  3. 2 A mean3_4 3.5
  4. 3 B mean1_2 5.5
  5. 4 B mean3_4 7.5

我考虑过在dplyr中结合summarisecase_when,但这不起作用(或者我没有正确使用它):

  1. dat %>%
  2. group_by(group) %>%
  3. summarise(mean1_2 = case_when(xy %in% 1:2 ~ mean(val)),
  4. mean3_4 = case_when(xy %in% 3:4 ~ mean(val)))

有没有其他方法?我想避免将数据转换为宽格式。

英文:

I'd like to summarise data by calculating the mean of values in one column conditional on the values in another column. Here's an example:

  1. dat &lt;- data.frame(group = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;),
  2. xy = c(1:4, 1:4),
  3. val = 1:8)
  4. &gt; dat
  5. group xy val
  6. 1 A 1 1
  7. 2 A 2 2
  8. 3 A 3 3
  9. 4 A 4 4
  10. 5 B 1 5
  11. 6 B 2 6
  12. 7 B 3 7
  13. 8 B 4 8

The desired output is:

  1. group var val
  2. 1 A mean1_2 1.5
  3. 2 A mean3_4 3.5
  4. 3 B mean1_2 5.5
  5. 4 B mean3_4 7.5

I thought about combining summarise and case_when in dplyr but that does not work (or I've not used it correctly).

  1. dat %&gt;%
  2. group_by(group) %&gt;%
  3. summarise(mean1_2 = case_when(xy %in% 1:2 ~ mean(val)),
  4. mean3_4 = case_when(xy %in% 3:4 ~ mean(val)))
  5. `summarise()` has grouped output by &#39;group&#39;. You can override using the `.groups` argument.
  6. # A tibble: 8 x 3
  7. # Groups: group [2]
  8. group mean1_2 mean3_4
  9. &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  10. 1 A 2.5 NA
  11. 2 A 2.5 NA
  12. 3 A NA 2.5
  13. 4 A NA 2.5
  14. 5 B 6.5 NA
  15. 6 B 6.5 NA
  16. 7 B NA 6.5
  17. 8 B NA 6.5

Is there another way? I'd like to avoid spreading the data to wide format.

答案1

得分: 1

我不确定你的条件,但你可以尝试:

  1. dat %>%
  2. mutate(key = ceiling(xy/2)) %>%
  3. group_by(group, key) %>%
  4. summarise(var = paste0(xy, collapse = "_"),
  5. val = mean(val)) %>%
  6. mutate(var = paste0('mean', var)) %>%
  7. select(-key)
  8. group var val
  9. <chr> <chr> <dbl>
  10. 1 A mean1_2 1.5
  11. 2 A mean3_4 3.5
  12. 3 B mean1_2 5.5
  13. 4 B mean3_4 7.5
英文:

I'm not sure about your condition but you may try

  1. dat %&gt;%
  2. mutate(key = ceiling(xy/2)) %&gt;%
  3. group_by(group, key) %&gt;%
  4. summarise(var = paste0(xy, collapse = &quot;_&quot;),
  5. val = mean(val)) %&gt;%
  6. mutate(var = paste0(&#39;mean&#39;,var)) %&gt;%
  7. select(-key)
  8. group var val
  9. &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
  10. 1 A mean1_2 1.5
  11. 2 A mean3_4 3.5
  12. 3 B mean1_2 5.5
  13. 4 B mean3_4 7.5

huangapple
  • 本文由 发表于 2023年2月8日 15:07:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382398.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定