英文:
Get mean values per sample, arranged by another column of ID's
问题
这是一个我想使用tidyverse解决的问题。很难解释,但示例将其清晰展示出来。
我有一个包含多个名称的数据集。每个名称由1个或多个ID表示(在实际数据中范围为1-5)。每个ID都有5个值,按抽取编号排序。
我想要获取每个名称nms的每个抽取的平均值。这意味着要获取每个个体id的每个抽取的平均val。一些名称(如A)只有一个ID,因此平均val或mean_val将相同,但由于B和C有多个ID,mean_val将对每个抽取进行平均。我希望结果看起来像这样:
# A tibble: 15 × 3
#   nms    draw   mean_val
#   <chr>  <int>     <dbl>
# 1 A         1        3  
# 2 A         2        4  
# 3 A         3        2  
# 4 A         4        4  
# 5 A         5        1  
# 6 B         1        1.5
# 7 B         2        1.5
# 8 B         3        2.5
# 9 B         4        2  
#10 B         5        2  
#11 C         1        1.5
#12 C         2        2.5
#13 C         3        2.5
#14 C         4        2.5
#15 C         5        4 
希望这可以帮助你解决问题。
英文:
Here is a problem that I would like to solve using the tidyverse. It's difficult to explain but the example lays it out.
I have a dataset with several names. Each name is represented by 1 or more ID's (ranges from 1-5 in the real data). Each ID has 5 values, ordered by the draw number.
library(tidyverse)
set.seed(190)
dplyr::tibble(nms = c(rep("A", 5), rep("B",10), rep("C", 10)),
              draw = c(rep(1:5, times = 5)),
              id = c(rep(c("A1", "B1", "B2", "C1", "C2"), each = 5)),
              val = c(sample(1:4, replace = T, size = 25))) %>%  print(n=25)
#> # A tibble: 25 × 4
#>    nms    draw id      val
#>    <chr> <int> <chr> <int>
#>  1 A         1 A1        3
#>  2 A         2 A1        4
#>  3 A         3 A1        2
#>  4 A         4 A1        4
#>  5 A         5 A1        1
#>  6 B         1 B1        2
#>  7 B         2 B1        1
#>  8 B         3 B1        3
#>  9 B         4 B1        3
#> 10 B         5 B1        2
#> 11 B         1 B2        1
#> 12 B         2 B2        2
#> 13 B         3 B2        2
#> 14 B         4 B2        1
#> 15 B         5 B2        2
#> 16 C         1 C1        2
#> 17 C         2 C1        1
#> 18 C         3 C1        1
#> 19 C         4 C1        3
#> 20 C         5 C1        4
#> 21 C         1 C2        1
#> 22 C         2 C2        4
#> 23 C         3 C2        4
#> 24 C         4 C2        2
#> 25 C         5 C2        4
<sup>Created on 2023-05-31 with reprex v2.0.2</sup>
I would like to get the average of each draw for each name nms. This means getting the average val of each draw for every individual id. Some names like A only have one id, so the average val, or mean_val, will be the same, but since B and C have multiple id's, the mean_val will be averaged for each draw. I would like the results to look like this
# A tibble: 15 × 3
#   nms    draw   mean_val
#   <chr> <int> <dbl>
# 1 A         1   3  
# 2 A         2   4  
# 3 A         3   2  
# 4 A         4   4  
# 5 A         5   1  
# 6 B         1   1.5
# 7 B         2   1.5
# 8 B         3   2.5
# 9 B         4   2  
#10 B         5   2  
#11 C         1   1.5
#12 C         2   2.5
#13 C         3   2.5
#14 C         4   2.5
#15 C         5   4 
答案1
得分: 0
我认为这将提供您想要的结果,或至少让您接近目标。
output <- df %>%
  group_by(nms, draw) %>%
  summarize(mean_val = mean(val))
# 一个 tibble: 15 × 3
# 组:   nms [3]
   nms    draw mean_val
   <chr> <int>    <dbl>
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4
英文:
I think this will give you what you want or at least get you close.
output <- df %>% 
  group_by(nms, draw) %>% 
  summarize(mean_val = mean(val))
# A tibble: 15 × 3
# Groups:   nms [3]
   nms    draw mean_val
   <chr> <int>    <dbl>
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论