获取R中分组计数的摘要

huangapple go评论52阅读模式
英文:

Obtaining a summary of grouped counts in R

问题

这应该很简单,但我一直被困扰住了:我试图找出获取分组计数的摘要统计信息的有效方法。以下是一个示例:

df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
df %>% group_by(pid) %>% count(pid)

这会输出期望的结果:

# A tibble: 4 × 2
# Groups:   pid [4]
    pid     n
  <dbl> <int>
1     1     1
2     2     2
3     3     3
4     4     4

然而,如果我想要这些分组计数的摘要,尝试创建新变量或使用add_count似乎不起作用,我猜测是因为变量的大小不同。例如:

df %>% group_by(pid) %>% count(pid) %>% mutate(count = summary(n))

会生成错误。生成分组计数的摘要统计信息(例如最小值、最大值、平均值等)的简单方法是什么?

英文:

This should be simple but I have been stumped by it: I am trying to figure out an efficient method for obtaining summary stats of a grouped count. Here's a toy example:

df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
df %&gt;% group_by(pid) %&gt;% count(pid)

which outputs the expected

# A tibble: 4 &#215; 2
# Groups:   pid [4]
    pid     n
  &lt;dbl&gt; &lt;int&gt;
1     1     1
2     2     2
3     3     3
4     4     4

However, what if I want a summary of those grouped counts? Attempting to mutate a new variable or add_count hasn't worked I assume because the variables are different sizes. For instance:

df %&gt;% group_by(pid) %&gt;% count(pid) %&gt;% mutate(count = summary(n))

generates an error. What would be a simple way to generate summary statistics of the grouped counts (e.g., min, max, mean, etc.)?

答案1

得分: 3

mutate用于向数据框添加列 - 在这里你不需要,你需要从数据框中提取列。

df %>%
  count(pid) %>%
  pull(n) %>%
  summary()
英文:

mutate is for adding columns to a data frame - you don't want that here, you need to pull the column out of the data frame.

df %&gt;% 
  count(pid) %&gt;% 
  pull(n) %&gt;% 
  summary()

huangapple
  • 本文由 发表于 2023年2月8日 23:59:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388448.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定