获取每个样本的平均值,按另一列的ID进行排列

huangapple go评论93阅读模式
英文:

Get mean values per sample, arranged by another column of ID's

问题

这是一个我想使用tidyverse解决的问题。很难解释,但示例将其清晰展示出来。

我有一个包含多个名称的数据集。每个名称由1个或多个ID表示(在实际数据中范围为1-5)。每个ID都有5个值,按抽取编号排序。

我想要获取每个名称nms的每个抽取的平均值。这意味着要获取每个个体id的每个抽取的平均val。一些名称(如A)只有一个ID,因此平均valmean_val将相同,但由于BC有多个ID,mean_val将对每个抽取进行平均。我希望结果看起来像这样:

  1. # A tibble: 15 × 3
  2. # nms draw mean_val
  3. # <chr> <int> <dbl>
  4. # 1 A 1 3
  5. # 2 A 2 4
  6. # 3 A 3 2
  7. # 4 A 4 4
  8. # 5 A 5 1
  9. # 6 B 1 1.5
  10. # 7 B 2 1.5
  11. # 8 B 3 2.5
  12. # 9 B 4 2
  13. #10 B 5 2
  14. #11 C 1 1.5
  15. #12 C 2 2.5
  16. #13 C 3 2.5
  17. #14 C 4 2.5
  18. #15 C 5 4

希望这可以帮助你解决问题。

英文:

Here is a problem that I would like to solve using the tidyverse. It's difficult to explain but the example lays it out.

I have a dataset with several names. Each name is represented by 1 or more ID's (ranges from 1-5 in the real data). Each ID has 5 values, ordered by the draw number.

  1. library(tidyverse)
  2. set.seed(190)
  3. dplyr::tibble(nms = c(rep(&quot;A&quot;, 5), rep(&quot;B&quot;,10), rep(&quot;C&quot;, 10)),
  4. draw = c(rep(1:5, times = 5)),
  5. id = c(rep(c(&quot;A1&quot;, &quot;B1&quot;, &quot;B2&quot;, &quot;C1&quot;, &quot;C2&quot;), each = 5)),
  6. val = c(sample(1:4, replace = T, size = 25))) %&gt;% print(n=25)
  7. #&gt; # A tibble: 25 &#215; 4
  8. #&gt; nms draw id val
  9. #&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;int&gt;
  10. #&gt; 1 A 1 A1 3
  11. #&gt; 2 A 2 A1 4
  12. #&gt; 3 A 3 A1 2
  13. #&gt; 4 A 4 A1 4
  14. #&gt; 5 A 5 A1 1
  15. #&gt; 6 B 1 B1 2
  16. #&gt; 7 B 2 B1 1
  17. #&gt; 8 B 3 B1 3
  18. #&gt; 9 B 4 B1 3
  19. #&gt; 10 B 5 B1 2
  20. #&gt; 11 B 1 B2 1
  21. #&gt; 12 B 2 B2 2
  22. #&gt; 13 B 3 B2 2
  23. #&gt; 14 B 4 B2 1
  24. #&gt; 15 B 5 B2 2
  25. #&gt; 16 C 1 C1 2
  26. #&gt; 17 C 2 C1 1
  27. #&gt; 18 C 3 C1 1
  28. #&gt; 19 C 4 C1 3
  29. #&gt; 20 C 5 C1 4
  30. #&gt; 21 C 1 C2 1
  31. #&gt; 22 C 2 C2 4
  32. #&gt; 23 C 3 C2 4
  33. #&gt; 24 C 4 C2 2
  34. #&gt; 25 C 5 C2 4

<sup>Created on 2023-05-31 with reprex v2.0.2</sup>

I would like to get the average of each draw for each name nms. This means getting the average val of each draw for every individual id. Some names like A only have one id, so the average val, or mean_val, will be the same, but since B and C have multiple id's, the mean_val will be averaged for each draw. I would like the results to look like this

  1. # A tibble: 15 &#215; 3
  2. # nms draw mean_val
  3. # &lt;chr&gt; &lt;int&gt; &lt;dbl&gt;
  4. # 1 A 1 3
  5. # 2 A 2 4
  6. # 3 A 3 2
  7. # 4 A 4 4
  8. # 5 A 5 1
  9. # 6 B 1 1.5
  10. # 7 B 2 1.5
  11. # 8 B 3 2.5
  12. # 9 B 4 2
  13. #10 B 5 2
  14. #11 C 1 1.5
  15. #12 C 2 2.5
  16. #13 C 3 2.5
  17. #14 C 4 2.5
  18. #15 C 5 4

答案1

得分: 0

我认为这将提供您想要的结果,或至少让您接近目标。

  1. output <- df %>%
  2. group_by(nms, draw) %>%
  3. summarize(mean_val = mean(val))
  1. # 一个 tibble: 15 × 3
  2. # 组: nms [3]
  3. nms draw mean_val
  4. <chr> <int> <dbl>
  5. 1 A 1 3
  6. 2 A 2 4
  7. 3 A 3 2
  8. 4 A 4 4
  9. 5 A 5 1
  10. 6 B 1 1.5
  11. 7 B 2 1.5
  12. 8 B 3 2.5
  13. 9 B 4 2
  14. 10 B 5 2
  15. 11 C 1 1.5
  16. 12 C 2 2.5
  17. 13 C 3 2.5
  18. 14 C 4 2.5
  19. 15 C 5 4
英文:

I think this will give you what you want or at least get you close.

  1. output &lt;- df %&gt;%
  2. group_by(nms, draw) %&gt;%
  3. summarize(mean_val = mean(val))
  1. # A tibble: 15 &#215; 3
  2. # Groups: nms [3]
  3. nms draw mean_val
  4. &lt;chr&gt; &lt;int&gt; &lt;dbl&gt;
  5. 1 A 1 3
  6. 2 A 2 4
  7. 3 A 3 2
  8. 4 A 4 4
  9. 5 A 5 1
  10. 6 B 1 1.5
  11. 7 B 2 1.5
  12. 8 B 3 2.5
  13. 9 B 4 2
  14. 10 B 5 2
  15. 11 C 1 1.5
  16. 12 C 2 2.5
  17. 13 C 3 2.5
  18. 14 C 4 2.5
  19. 15 C 5 4

huangapple
  • 本文由 发表于 2023年6月1日 06:13:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377616.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定