英文:
Get mean values per sample, arranged by another column of ID's
问题
这是一个我想使用tidyverse解决的问题。很难解释,但示例将其清晰展示出来。
我有一个包含多个名称的数据集。每个名称由1个或多个ID表示(在实际数据中范围为1-5)。每个ID都有5个值,按抽取编号排序。
我想要获取每个名称nms
的每个抽取的平均值。这意味着要获取每个个体id
的每个抽取的平均val
。一些名称(如A
)只有一个ID,因此平均val
或mean_val
将相同,但由于B
和C
有多个ID,mean_val
将对每个抽取进行平均。我希望结果看起来像这样:
# A tibble: 15 × 3
# nms draw mean_val
# <chr> <int> <dbl>
# 1 A 1 3
# 2 A 2 4
# 3 A 3 2
# 4 A 4 4
# 5 A 5 1
# 6 B 1 1.5
# 7 B 2 1.5
# 8 B 3 2.5
# 9 B 4 2
#10 B 5 2
#11 C 1 1.5
#12 C 2 2.5
#13 C 3 2.5
#14 C 4 2.5
#15 C 5 4
希望这可以帮助你解决问题。
英文:
Here is a problem that I would like to solve using the tidyverse. It's difficult to explain but the example lays it out.
I have a dataset with several names. Each name is represented by 1 or more ID's (ranges from 1-5 in the real data). Each ID has 5 values, ordered by the draw number.
library(tidyverse)
set.seed(190)
dplyr::tibble(nms = c(rep("A", 5), rep("B",10), rep("C", 10)),
draw = c(rep(1:5, times = 5)),
id = c(rep(c("A1", "B1", "B2", "C1", "C2"), each = 5)),
val = c(sample(1:4, replace = T, size = 25))) %>% print(n=25)
#> # A tibble: 25 × 4
#> nms draw id val
#> <chr> <int> <chr> <int>
#> 1 A 1 A1 3
#> 2 A 2 A1 4
#> 3 A 3 A1 2
#> 4 A 4 A1 4
#> 5 A 5 A1 1
#> 6 B 1 B1 2
#> 7 B 2 B1 1
#> 8 B 3 B1 3
#> 9 B 4 B1 3
#> 10 B 5 B1 2
#> 11 B 1 B2 1
#> 12 B 2 B2 2
#> 13 B 3 B2 2
#> 14 B 4 B2 1
#> 15 B 5 B2 2
#> 16 C 1 C1 2
#> 17 C 2 C1 1
#> 18 C 3 C1 1
#> 19 C 4 C1 3
#> 20 C 5 C1 4
#> 21 C 1 C2 1
#> 22 C 2 C2 4
#> 23 C 3 C2 4
#> 24 C 4 C2 2
#> 25 C 5 C2 4
<sup>Created on 2023-05-31 with reprex v2.0.2</sup>
I would like to get the average of each draw for each name nms
. This means getting the average val
of each draw for every individual id
. Some names like A
only have one id, so the average val
, or mean_val
, will be the same, but since B
and C
have multiple id's, the mean_val
will be averaged for each draw. I would like the results to look like this
# A tibble: 15 × 3
# nms draw mean_val
# <chr> <int> <dbl>
# 1 A 1 3
# 2 A 2 4
# 3 A 3 2
# 4 A 4 4
# 5 A 5 1
# 6 B 1 1.5
# 7 B 2 1.5
# 8 B 3 2.5
# 9 B 4 2
#10 B 5 2
#11 C 1 1.5
#12 C 2 2.5
#13 C 3 2.5
#14 C 4 2.5
#15 C 5 4
答案1
得分: 0
我认为这将提供您想要的结果,或至少让您接近目标。
output <- df %>%
group_by(nms, draw) %>%
summarize(mean_val = mean(val))
# 一个 tibble: 15 × 3
# 组: nms [3]
nms draw mean_val
<chr> <int> <dbl>
1 A 1 3
2 A 2 4
3 A 3 2
4 A 4 4
5 A 5 1
6 B 1 1.5
7 B 2 1.5
8 B 3 2.5
9 B 4 2
10 B 5 2
11 C 1 1.5
12 C 2 2.5
13 C 3 2.5
14 C 4 2.5
15 C 5 4
英文:
I think this will give you what you want or at least get you close.
output <- df %>%
group_by(nms, draw) %>%
summarize(mean_val = mean(val))
# A tibble: 15 × 3
# Groups: nms [3]
nms draw mean_val
<chr> <int> <dbl>
1 A 1 3
2 A 2 4
3 A 3 2
4 A 4 4
5 A 5 1
6 B 1 1.5
7 B 2 1.5
8 B 3 2.5
9 B 4 2
10 B 5 2
11 C 1 1.5
12 C 2 2.5
13 C 3 2.5
14 C 4 2.5
15 C 5 4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论