按组计算平均值,排除选择的行。

huangapple go评论100阅读模式
英文:

mean by group, excluding selected rows

问题

我会以此1旧帖子作为参考。所以,修改后的数据集如下:

  1. df <- data.frame(dive = factor(sample(c("dive1","dive2","dive3","dive4"), 14, replace=TRUE)),
  2. speed = runif(14)
  3. )
  4. > df
  5. dive speed
  6. 1 dive1 0.627296799
  7. 2 dive1 0.288594538
  8. 3 dive4 0.598177856
  9. 4 dive2 0.371158436
  10. 5 dive2 0.827468739
  11. 6 dive3 0.485977449
  12. 7 dive2 0.151295215
  13. 8 dive4 0.773988372
  14. 9 dive2 0.567155356
  15. 10 dive1 0.008585884
  16. 11 dive4 0.433648371
  17. 12 dive2 0.759196515
  18. 13 dive2 0.641193241
  19. 14 dive3 0.089451537

我想修改speed列,使其包含dive1dive2的每个组的平均值,对于其他两个组,保持df不变。

我尝试过使用if(当然还有group_bysummarise),但这不是我想要的,我收到了警告消息并且只有4个结果...

  1. df2 <- if(!(df$dive %in% c("dive3", "dive4"))){
  2. summarise(group_by(df, dive), speed = mean(speed))
  3. }
  4. 警告信息:
  5. In if (!(df$dive %in% c("dive3", "dive4"))) { :
  6. the condition has length > 1 and only the first element will be used
  7. > df2
  8. # A tibble: 4 x 2
  9. dive speed
  10. <fct> <dbl>
  11. 1 dive1 0.860
  12. 2 dive2 0.460
  13. 3 dive3 0.277
  14. 4 dive4 0.330
英文:

I'll take this old post as reference. So, the modified dataset looks like the following:

  1. df &lt;- data.frame(dive = factor(sample(c(&quot;dive1&quot;,&quot;dive2&quot;,&quot;dive3&quot;,&quot;dive4&quot;), 14, replace=TRUE)),
  2. speed = runif(14)
  3. )
  4. &gt; df
  5. dive speed
  6. 1 dive1 0.627296799
  7. 2 dive1 0.288594538
  8. 3 dive4 0.598177856
  9. 4 dive2 0.371158436
  10. 5 dive2 0.827468739
  11. 6 dive3 0.485977449
  12. 7 dive2 0.151295215
  13. 8 dive4 0.773988372
  14. 9 dive2 0.567155356
  15. 10 dive1 0.008585884
  16. 11 dive4 0.433648371
  17. 12 dive2 0.759196515
  18. 13 dive2 0.641193241
  19. 14 dive3 0.089451537

I would like to modify the column speed so that it contains the mean per group (same entry for each .group) for dive1 and dive2, and do nothing (keep df as it is) for the other two groups).

I tried with if (and, of course, group_by and summarise), but that's not what I want, I receive a warning message and only 4 results...

  1. df2 &lt;- if(!(df$dive %in% c(&quot;dive3&quot;, &quot;dive4&quot;))){
  2. summarise(group_by(df, dive), speed = mean(speed))
  3. }
  4. Warning message:
  5. In if (!(df$dive %in% c(&quot;dive3&quot;, &quot;dive4&quot;))) { :
  6. the condition has length &gt; 1 and only the first element will be used
  7. &gt; df2
  8. # A tibble: 4 x 2
  9. dive speed
  10. &lt;fct&gt; &lt;dbl&gt;
  11. 1 dive1 0.860
  12. 2 dive2 0.460
  13. 3 dive3 0.277
  14. 4 dive4 0.330

答案1

得分: 4

  1. df %>%
  2. group_by(dive) %>%
  3. mutate(speed = if (first(dive) %in% c("dive1", "dive2")) mean(speed) else speed) %>%
  4. ungroup()

or a shorter version using

  1. df %>%
  2. mutate(speed = if (first(dive) %in% c("dive1", "dive2")) mean(speed) else speed,
  3. .by = dive)

If you want to reduce the two groups to a single row while keeping other groups as-is (not reduced), then perhaps:

  1. df %>%
  2. filter(dive %in% c("dive1", "dive2")) %>%
  3. summarize(speed = mean(speed), .by = dive) %>%
  4. bind_rows(filter(df, !dive %in% c("dive1", "dive2")))

以上是您要的代码的翻译部分。

英文:
  1. df %&gt;%
  2. group_by(dive) %&gt;%
  3. mutate(speed = if (first(dive) %in% c(&quot;dive1&quot;, &quot;dive2&quot;)) mean(speed) else speed) %&gt;%
  4. ungroup()
  5. # # A tibble: 14 &#215; 2
  6. # dive speed
  7. # &lt;fct&gt; &lt;dbl&gt;
  8. # 1 dive4 0.548
  9. # 2 dive3 0.156
  10. # 3 dive4 0.207
  11. # 4 dive3 0.148
  12. # 5 dive4 0.886
  13. # 6 dive1 0.498
  14. # 7 dive3 0.690
  15. # 8 dive1 0.498
  16. # 9 dive4 0.0968
  17. # 10 dive3 0.596
  18. # 11 dive2 0.447
  19. # 12 dive2 0.447
  20. # 13 dive3 0.859
  21. # 14 dive3 0.663

or perhaps a little shorter using

  1. df %&gt;%
  2. mutate(speed = if (first(dive) %in% c(&quot;dive1&quot;, &quot;dive2&quot;)) mean(speed) else speed,
  3. .by = dive)

If I misunderstood, and instead you want to reduce the two groups to a single row while keeping other groups as-is (not reduced), then perhaps:

  1. df %&gt;%
  2. filter(dive %in% c(&quot;dive1&quot;, &quot;dive2&quot;)) %&gt;%
  3. summarize(speed = mean(speed), .by = dive) %&gt;%
  4. bind_rows(filter(df, !dive %in% c(&quot;dive1&quot;, &quot;dive2&quot;)))
  5. # dive speed
  6. # 1 dive1 0.4983562
  7. # 2 dive2 0.4470575
  8. # 3 dive4 0.5477776
  9. # 4 dive3 0.1558491
  10. # 5 dive4 0.2068528
  11. # 6 dive3 0.1479428
  12. # 7 dive4 0.8858552
  13. # 8 dive3 0.6896862
  14. # 9 dive4 0.0967569
  15. # 10 dive3 0.5961494
  16. # 11 dive3 0.8593978
  17. # 12 dive3 0.6634452

huangapple
  • 本文由 发表于 2023年4月19日 23:01:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定