英文:
Obtaining a summary of grouped counts in R
问题
这应该很简单,但我一直被困扰住了:我试图找出获取分组计数的摘要统计信息的有效方法。以下是一个示例:
df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
df %>% group_by(pid) %>% count(pid)
这会输出期望的结果:
# A tibble: 4 × 2
# Groups: pid [4]
pid n
<dbl> <int>
1 1 1
2 2 2
3 3 3
4 4 4
然而,如果我想要这些分组计数的摘要,尝试创建新变量或使用add_count
似乎不起作用,我猜测是因为变量的大小不同。例如:
df %>% group_by(pid) %>% count(pid) %>% mutate(count = summary(n))
会生成错误。生成分组计数的摘要统计信息(例如最小值、最大值、平均值等)的简单方法是什么?
英文:
This should be simple but I have been stumped by it: I am trying to figure out an efficient method for obtaining summary stats of a grouped count. Here's a toy example:
df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
df %>% group_by(pid) %>% count(pid)
which outputs the expected
# A tibble: 4 × 2
# Groups: pid [4]
pid n
<dbl> <int>
1 1 1
2 2 2
3 3 3
4 4 4
However, what if I want a summary of those grouped counts? Attempting to mutate a new variable or add_count hasn't worked I assume because the variables are different sizes. For instance:
df %>% group_by(pid) %>% count(pid) %>% mutate(count = summary(n))
generates an error. What would be a simple way to generate summary statistics of the grouped counts (e.g., min, max, mean, etc.)?
答案1
得分: 3
mutate
用于向数据框添加列 - 在这里你不需要,你需要从数据框中提取列。
df %>%
count(pid) %>%
pull(n) %>%
summary()
英文:
mutate
is for adding columns to a data frame - you don't want that here, you need to pull the column out of the data frame.
df %>%
count(pid) %>%
pull(n) %>%
summary()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论