如何计算每个组的百分比取决于不同的变量?

huangapple go评论85阅读模式
英文:

How to calculate percentage for each group depends on the different variable?

问题

以下是代码的翻译部分:

这是虚拟数据集的R代码:

  1. c <- c(10, 20, 30, 40, 50, 40, 2, 40, 10, 50)
  2. b <- c(40, 2, 40, 10, 50, 10, 20, 30, 40, 50)
  3. a <- c(10, 50, 3, 60, 100,40, 2, 40, 10, 50)
  4. id <- c("a", "b", "b", "a", "c", "a", "b", "b", "a", "c")
  5. variation <- c("a3", "a3", "b1", "a2", "b1","a3", "a1", "b1", "a1", "b1")
  6. data <- data.frame(id, a, b, c, variation)
  7. head(data)
  8. # id a b c variation
  9. # 1 a 10 40 10 a3
  10. # 2 b 50 2 20 a3
  11. # 3 b 3 40 30 b1
  12. # 4 a 60 10 40 a2
  13. # 5 c 100 50 50 b1
  14. # 6 a 40 10 40 a3
  15. # 7 b 2 20 2 a1
  16. # 8 b 40 30 40 b1
  17. # 9 a 10 40 10 a1
  18. # 10 c 50 50 50 b1

我可以为每个单独的id进行百分比计算,过滤后如下:

  1. data_filter <- data %>% filter(id == "a")
  2. data_filter
  3. # id a b c variation
  4. # 1 a 10 40 10 a3
  5. # 2 a 60 10 40 a2
  6. # 3 a 40 10 40 a3
  7. # 4 a 10 40 10 a1
  8. # 数据转换
  9. data_filter_percentage <- data_filter %>%
  10. group_by(variation) %>%
  11. count() %>%
  12. ungroup() %>%
  13. mutate(perc = `n` / sum(`n`)) %>%
  14. arrange(perc) %>%
  15. mutate(labels = scales::percent(perc))
  16. head(data_filter_percentage)
  17. # A tibble: 3 x 4
  18. # variation n perc labels
  19. # <chr> <int> <dbl> <chr>
  20. # 1 a1 1 0.25 25%
  21. # 2 a2 1 0.25 25%
  22. # 3 a3 2 0.5 50%

然而,我的问题是,是否可以对所有"id"执行上述管道而无需单独过滤?

英文:

This the dummy dataset R code:

  1. c &lt;- c(10, 20, 30, 40, 50, 40, 2, 40, 10, 50)
  2. b &lt;- c(40, 2, 40, 10, 50, 10, 20, 30, 40, 50)
  3. a &lt;- c(10, 50, 3, 60, 100,40, 2, 40, 10, 50)
  4. id &lt;- c(&quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;a&quot;, &quot;c&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;a&quot;, &quot;c&quot;)
  5. variation &lt;- c(&quot;a3&quot;, &quot;a3&quot;, &quot;b1&quot;, &quot;a2&quot;, &quot;b1&quot;,&quot;a3&quot;, &quot;a1&quot;, &quot;b1&quot;, &quot;a1&quot;, &quot;b1&quot; )
  6. data &lt;- data.frame(id, a, b, c, variation)
  7. head(data)
  8. # id a b c variation
  9. # 1 a 10 40 10 a3
  10. # 2 b 50 2 20 a3
  11. # 3 b 3 40 30 b1
  12. # 4 a 60 10 40 a2
  13. # 5 c 100 50 50 b1
  14. # 6 a 40 10 40 a3
  15. # 7 b 2 20 2 a1
  16. # 8 b 40 30 40 b1
  17. # 9 a 10 40 10 a1
  18. # 10 c 50 50 50 b1

I can calculate percentages for individual id after filtering:

  1. data_filter &lt;- data %&gt;% filter(id == &quot;a&quot;)
  2. data_filter
  3. # id a b c variation
  4. # 1 a 10 40 10 a3
  5. # 2 a 60 10 40 a2
  6. # 3 a 40 10 40 a3
  7. # 4 a 10 40 10 a1
  8. # Data transformation
  9. data_filter_percentage &lt;- data_filter %&gt;%
  10. group_by(variation) %&gt;% # Variable to be transformed
  11. count() %&gt;%
  12. ungroup() %&gt;%
  13. mutate(perc = `n` / sum(`n`)) %&gt;%
  14. arrange(perc) %&gt;%
  15. mutate(labels = scales::percent(perc))
  16. head(data_filter_percentage)
  17. # A tibble: 3 x 4
  18. # variation n perc labels
  19. # &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt;
  20. # 1 a1 1 0.25 25%
  21. # 2 a2 1 0.25 25%
  22. # 3 a3 2 0.5 50%

However, my question is, Is it possible to perform above pipeline for all "id" without filtering individually?

答案1

得分: 1

以下是翻译好的代码部分:

  1. library(dplyr)
  2. data %>%
  3. group_by(id) %>%
  4. count(variation) %>%
  5. mutate(perc = n / sum(n), labels = scales::percent(perc)) %>%
  6. ungroup()

Briefly,

  1. data %>%
  2. count(id, variation) %>%
  3. mutate(perc = n / sum(n), labels = scales::percent(perc), .by = id)
  4. # # A tibble: 7 × 5
  5. # id variation n perc labels
  6. # <chr> <chr> <int> <dbl> <chr>
  7. # 1 a a1 1 0.25 25%
  8. # 2 a a2 1 0.25 25%
  9. # 3 a a3 2 0.5 50%
  10. # 4 b a1 1 0.25 25%
  11. # 5 b a3 1 0.25 25%
  12. # 6 b b1 2 0.5 50%
  13. # 7 c b1 2 1 100%
英文:

You can try the following workflow:

  1. library(dplyr)
  2. data %&gt;%
  3. group_by(id) %&gt;%
  4. count(variation) %&gt;%
  5. mutate(perc = n / sum(n), labels = scales::percent(perc)) %&gt;%
  6. ungroup()

Briefly,

  1. data %&gt;%
  2. count(id, variation) %&gt;%
  3. mutate(perc = n / sum(n), labels = scales::percent(perc), .by = id)
  4. # # A tibble: 7 &#215; 5
  5. # id variation n perc labels
  6. # &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt;
  7. # 1 a a1 1 0.25 25%
  8. # 2 a a2 1 0.25 25%
  9. # 3 a a3 2 0.5 50%
  10. # 4 b a1 1 0.25 25%
  11. # 5 b a3 1 0.25 25%
  12. # 6 b b1 2 0.5 50%
  13. # 7 c b1 2 1 100%

huangapple
  • 本文由 发表于 2023年7月18日 16:52:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76711042.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定