获取每个样本的平均值,按另一列的ID进行排列

huangapple go评论56阅读模式
英文:

Get mean values per sample, arranged by another column of ID's

问题

这是一个我想使用tidyverse解决的问题。很难解释,但示例将其清晰展示出来。

我有一个包含多个名称的数据集。每个名称由1个或多个ID表示(在实际数据中范围为1-5)。每个ID都有5个值,按抽取编号排序。

我想要获取每个名称nms的每个抽取的平均值。这意味着要获取每个个体id的每个抽取的平均val。一些名称(如A)只有一个ID,因此平均valmean_val将相同,但由于BC有多个ID,mean_val将对每个抽取进行平均。我希望结果看起来像这样:

# A tibble: 15 × 3
#   nms    draw   mean_val
#   <chr>  <int>     <dbl>
# 1 A         1        3  
# 2 A         2        4  
# 3 A         3        2  
# 4 A         4        4  
# 5 A         5        1  
# 6 B         1        1.5
# 7 B         2        1.5
# 8 B         3        2.5
# 9 B         4        2  
#10 B         5        2  
#11 C         1        1.5
#12 C         2        2.5
#13 C         3        2.5
#14 C         4        2.5
#15 C         5        4 

希望这可以帮助你解决问题。

英文:

Here is a problem that I would like to solve using the tidyverse. It's difficult to explain but the example lays it out.

I have a dataset with several names. Each name is represented by 1 or more ID's (ranges from 1-5 in the real data). Each ID has 5 values, ordered by the draw number.

library(tidyverse)

set.seed(190)

dplyr::tibble(nms = c(rep(&quot;A&quot;, 5), rep(&quot;B&quot;,10), rep(&quot;C&quot;, 10)),
              draw = c(rep(1:5, times = 5)),
              id = c(rep(c(&quot;A1&quot;, &quot;B1&quot;, &quot;B2&quot;, &quot;C1&quot;, &quot;C2&quot;), each = 5)),
              val = c(sample(1:4, replace = T, size = 25))) %&gt;%  print(n=25)
#&gt; # A tibble: 25 &#215; 4
#&gt;    nms    draw id      val
#&gt;    &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;int&gt;
#&gt;  1 A         1 A1        3
#&gt;  2 A         2 A1        4
#&gt;  3 A         3 A1        2
#&gt;  4 A         4 A1        4
#&gt;  5 A         5 A1        1
#&gt;  6 B         1 B1        2
#&gt;  7 B         2 B1        1
#&gt;  8 B         3 B1        3
#&gt;  9 B         4 B1        3
#&gt; 10 B         5 B1        2
#&gt; 11 B         1 B2        1
#&gt; 12 B         2 B2        2
#&gt; 13 B         3 B2        2
#&gt; 14 B         4 B2        1
#&gt; 15 B         5 B2        2
#&gt; 16 C         1 C1        2
#&gt; 17 C         2 C1        1
#&gt; 18 C         3 C1        1
#&gt; 19 C         4 C1        3
#&gt; 20 C         5 C1        4
#&gt; 21 C         1 C2        1
#&gt; 22 C         2 C2        4
#&gt; 23 C         3 C2        4
#&gt; 24 C         4 C2        2
#&gt; 25 C         5 C2        4

<sup>Created on 2023-05-31 with reprex v2.0.2</sup>

I would like to get the average of each draw for each name nms. This means getting the average val of each draw for every individual id. Some names like A only have one id, so the average val, or mean_val, will be the same, but since B and C have multiple id's, the mean_val will be averaged for each draw. I would like the results to look like this

# A tibble: 15 &#215; 3
#   nms    draw   mean_val
#   &lt;chr&gt; &lt;int&gt; &lt;dbl&gt;
# 1 A         1   3  
# 2 A         2   4  
# 3 A         3   2  
# 4 A         4   4  
# 5 A         5   1  
# 6 B         1   1.5
# 7 B         2   1.5
# 8 B         3   2.5
# 9 B         4   2  
#10 B         5   2  
#11 C         1   1.5
#12 C         2   2.5
#13 C         3   2.5
#14 C         4   2.5
#15 C         5   4 

答案1

得分: 0

我认为这将提供您想要的结果,或至少让您接近目标。

output <- df %>%
  group_by(nms, draw) %>%
  summarize(mean_val = mean(val))
# 一个 tibble: 15 × 3
# 组:   nms [3]
   nms    draw mean_val
   <chr> <int>    <dbl>
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4
英文:

I think this will give you what you want or at least get you close.

output &lt;- df %&gt;% 
  group_by(nms, draw) %&gt;% 
  summarize(mean_val = mean(val))
# A tibble: 15 &#215; 3
# Groups:   nms [3]
   nms    draw mean_val
   &lt;chr&gt; &lt;int&gt;    &lt;dbl&gt;
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4

huangapple
  • 本文由 发表于 2023年6月1日 06:13:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377616.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定