获取R中分组计数的摘要

huangapple go评论84阅读模式
英文:

Obtaining a summary of grouped counts in R

问题

这应该很简单,但我一直被困扰住了:我试图找出获取分组计数的摘要统计信息的有效方法。以下是一个示例:

  1. df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
  2. df %>% group_by(pid) %>% count(pid)

这会输出期望的结果:

  1. # A tibble: 4 × 2
  2. # Groups: pid [4]
  3. pid n
  4. <dbl> <int>
  5. 1 1 1
  6. 2 2 2
  7. 3 3 3
  8. 4 4 4

然而,如果我想要这些分组计数的摘要,尝试创建新变量或使用add_count似乎不起作用,我猜测是因为变量的大小不同。例如:

  1. df %>% group_by(pid) %>% count(pid) %>% mutate(count = summary(n))

会生成错误。生成分组计数的摘要统计信息(例如最小值、最大值、平均值等)的简单方法是什么?

英文:

This should be simple but I have been stumped by it: I am trying to figure out an efficient method for obtaining summary stats of a grouped count. Here's a toy example:

  1. df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
  2. df %&gt;% group_by(pid) %&gt;% count(pid)

which outputs the expected

  1. # A tibble: 4 &#215; 2
  2. # Groups: pid [4]
  3. pid n
  4. &lt;dbl&gt; &lt;int&gt;
  5. 1 1 1
  6. 2 2 2
  7. 3 3 3
  8. 4 4 4

However, what if I want a summary of those grouped counts? Attempting to mutate a new variable or add_count hasn't worked I assume because the variables are different sizes. For instance:

  1. df %&gt;% group_by(pid) %&gt;% count(pid) %&gt;% mutate(count = summary(n))

generates an error. What would be a simple way to generate summary statistics of the grouped counts (e.g., min, max, mean, etc.)?

答案1

得分: 3

mutate用于向数据框添加列 - 在这里你不需要,你需要从数据框中提取列。

  1. df %>%
  2. count(pid) %>%
  3. pull(n) %>%
  4. summary()
英文:

mutate is for adding columns to a data frame - you don't want that here, you need to pull the column out of the data frame.

  1. df %&gt;%
  2. count(pid) %&gt;%
  3. pull(n) %&gt;%
  4. summary()

huangapple
  • 本文由 发表于 2023年2月8日 23:59:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388448.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定