如何在R中根据另一个变量对数据进行条件汇总?

huangapple go评论52阅读模式
英文:

How to summarise data conditional on another variable in R?

问题

我想通过在另一列的值上进行条件计算,来总结数据,计算平均值。这里是一个例子:

dat <- data.frame(group = c("A", "A", "A", "A", "B", "B", "B", "B"),
                  xy = c(1:4, 1:4),
                  val = 1:8)

期望的输出是:

  group     var val
1     A mean1_2 1.5
2     A mean3_4 3.5
3     B mean1_2 5.5
4     B mean3_4 7.5

我考虑过在dplyr中结合summarisecase_when,但这不起作用(或者我没有正确使用它):

dat %>%
  group_by(group) %>%
  summarise(mean1_2 = case_when(xy %in% 1:2 ~ mean(val)),
            mean3_4 = case_when(xy %in% 3:4 ~ mean(val)))

有没有其他方法?我想避免将数据转换为宽格式。

英文:

I'd like to summarise data by calculating the mean of values in one column conditional on the values in another column. Here's an example:

dat &lt;- data.frame(group = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;),
                  xy = c(1:4, 1:4),
                  val = 1:8)
&gt; dat
  group xy val
1     A  1   1
2     A  2   2
3     A  3   3
4     A  4   4
5     B  1   5
6     B  2   6
7     B  3   7
8     B  4   8

The desired output is:

  group     var val
1     A mean1_2 1.5
2     A mean3_4 3.5
3     B mean1_2 5.5
4     B mean3_4 7.5

I thought about combining summarise and case_when in dplyr but that does not work (or I've not used it correctly).

dat %&gt;%
  group_by(group) %&gt;%
  summarise(mean1_2 = case_when(xy %in% 1:2 ~ mean(val)),
            mean3_4 = case_when(xy %in% 3:4 ~ mean(val)))
`summarise()` has grouped output by &#39;group&#39;. You can override using the `.groups` argument.
# A tibble: 8 x 3
# Groups:   group [2]
  group mean1_2 mean3_4
  &lt;chr&gt;   &lt;dbl&gt;   &lt;dbl&gt;
1 A         2.5    NA  
2 A         2.5    NA  
3 A        NA       2.5
4 A        NA       2.5
5 B         6.5    NA  
6 B         6.5    NA  
7 B        NA       6.5
8 B        NA       6.5

Is there another way? I'd like to avoid spreading the data to wide format.

答案1

得分: 1

我不确定你的条件,但你可以尝试:

dat %>%
  mutate(key = ceiling(xy/2)) %>%
  group_by(group, key) %>%
  summarise(var = paste0(xy, collapse = "_"),
            val = mean(val)) %>%
  mutate(var = paste0('mean', var)) %>%
  select(-key)

  group var       val
  <chr> <chr>   <dbl>
1 A     mean1_2   1.5
2 A     mean3_4   3.5
3 B     mean1_2   5.5
4 B     mean3_4   7.5
英文:

I'm not sure about your condition but you may try

dat %&gt;%
  mutate(key = ceiling(xy/2)) %&gt;%
  group_by(group, key) %&gt;%
  summarise(var = paste0(xy, collapse = &quot;_&quot;),
            val = mean(val)) %&gt;%
  mutate(var = paste0(&#39;mean&#39;,var)) %&gt;%
  select(-key)

  group var       val
  &lt;chr&gt; &lt;chr&gt;   &lt;dbl&gt;
1 A     mean1_2   1.5
2 A     mean3_4   3.5
3 B     mean1_2   5.5
4 B     mean3_4   7.5

huangapple
  • 本文由 发表于 2023年2月8日 15:07:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382398.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定