在dplyr::group_by中,获取一个或多个分组变量中的观察数量。

huangapple go评论177阅读模式
英文:

Within dplyr::group_by, obtain the number of observations for ONE of multiple grouping variables

问题

以下是您要翻译的部分:

"It's very possible this has been asked before, however I am having a very difficult time articulating my problem.

Within my data, I have 3 variables, LOCATION, TOPIC, and RESPONSE. I would like to calculate the distribution for each combination of TOPIC and RESPONSE by LOCATION.

Create toy data and perform initial data prep

  1. responses <- data.frame(LOCATION = c("LOC_A", "LOC_A", "LOC_A", "LOC_A", "LOC_A",
  2. "LOC_A", "LOC_A", "LOC_A",
  3. "LOC_B", "LOC_B", "LOC_B", "LOC_B", "LOC_B",
  4. "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C",
  5. "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C",
  6. "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C"),
  7. TOPIC = c("Dogs", "Dogs", "Dogs", "Dogs", "Dogs", "Dogs",
  8. "Lizards", "Lizards", "Lizards",
  9. "Lizards", "Lizards", "Lizards", "Lizards", "Lizards",
  10. "Lizards", "Lizards", "Snakes", "Snakes", "Snakes", "Snakes", "Snakes",
  11. "Snakes", "Dogs", "Snakes", "Dogs", "Snakes", "Dogs",
  12. "Snakes", "Dogs", "Snakes"),
  13. RESP = c("Agree", "Disagree", "Agree", "Disagree", "Agree",
  14. "Disagree", "Agree", "Disagree",
  15. "Agree", "Disagree", "Agree", "Disagree", "Neither", "Agree",
  16. "Neither", "Agree", "Neither", "Agree", "Neither",
  17. "Agree", "Neither", "Agree", "Agree", "Neither",
  18. "Agree", "Neither", "Agree", "Disagree", "Disagree",
  19. "Neither"))

获取每个组合级别的计数

distribution <- responses %>%
table() %>%
as.data.frame() %>%

使其更易读

dplyr::arrange(LOCATION, TOPIC, RESP)

以下是一个使用循环创建所需输出的示例解决方案:

  1. # 丑陋的循环解决方案 :(
  2. # 初始化输出容器
  3. out &lt;- list()
  4. # 遍历每个位置
  5. for(loc in unique(distribution$LOCATION)){
  6. # 子集该位置的分布
  7. thisDist &lt;- dplyr::filter(distribution, LOCATION == loc)
  8. # 计算该位置的每个响应的百分比
  9. thisDist$percent &lt;- thisDist$Freq/sum(thisDist$Freq)
  10. # 存储带有百分比列的分布 df
  11. out[[loc]] &lt;- thisDist
  12. }
  13. # 将输出组合成单个 df
  14. out &lt;- do.call(&quot;rbind&quot;, out)

我想要的是一个简洁的tidyverse解决方案。以下是描述我想象中的解决方案的伪代码:

  1. # 想象中的tidyverse解决方案 :)
  2. out &lt;- distribution %&gt;%
  3. group_by(LOCATION, TOPIC, RESP) %&gt;%
  4. summarise(#percent = Freq/(sum(&lt;all-Freq-values-for-this-group&#39;s-LOCATION-value&gt;))
  5. )

我在这里想要做的是获取当前组的LOCATION值的所有Freq值的总和。是否有一种在group_by/summarise内部实现这一点的好方法?

感谢您的阅读,希望这不会完全令人费解。

英文:

It's very possible this has been asked before, however I am having a very difficult time articulating my problem.

Within my data, I have 3 variables, LOCATION, TOPIC, and RESPONSE. I would like to calculate the distribution for each combination of TOPIC and RESPONSE by LOCATION.

Create toy data and perform initial data prep

  1. responses &lt;- data.frame(LOCATION = c(&quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;,
  2. &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;,
  3. &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;,
  4. &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;,
  5. &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;,
  6. &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;),
  7. TOPIC = c(&quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;,
  8. &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;,
  9. &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;,
  10. &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;,
  11. &quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;, &quot;Dogs&quot;,
  12. &quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;),
  13. RESP = c(&quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;,
  14. &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;,
  15. &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Neither&quot;, &quot;Agree&quot;,
  16. &quot;Neither&quot;, &quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Neither&quot;,
  17. &quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Agree&quot;, &quot;Neither&quot;,
  18. &quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Disagree&quot;,
  19. &quot;Neither&quot;))
  20. # Obtain counts for each combination of levels
  21. distribution &lt;- responses %&gt;%
  22. table() %&gt;%
  23. as.data.frame() %&gt;%
  24. # Make it more readable
  25. dplyr::arrange(LOCATION, TOPIC, RESP)

Here is an example solution which uses a loop to create my desired output:

  1. # ugly loop solution :(
  2. # Initialize output container
  3. out &lt;- list()
  4. # Iterate over each location
  5. for(loc in unique(distribution$LOCATION)){
  6. # Subset distribution for this location
  7. thisDist &lt;- dplyr::filter(distribution, LOCATION == loc)
  8. # Calculate percent of each response for this location
  9. thisDist$percent &lt;- thisDist$Freq/sum(thisDist$Freq)
  10. # Store distribution df with percent column
  11. out[[loc]] &lt;- thisDist
  12. }
  13. # combine output into single df
  14. out &lt;- do.call(&quot;rbind&quot;, out)

What I would like to have is a concise tidyverse solution. Here is some pseudo-code which describes my imaginary solution.

  1. # Imaginary tidyverse solution :)
  2. out &lt;- distribution %&gt;%
  3. group_by(LOCATION, TOPIC, RESP) %&gt;%
  4. summarise(#percent = Freq/(sum(&lt;all-Freq-values-for-this-group&#39;s-LOCATION-value&gt;))
  5. )

What I'm looking to do here is obtain the sum of all Freq values for the LOCATION value of the current group. Is there a nice way to do this within a group_by/summarise?

Thanks for reading, I hope this isn't completely inscrutable.

答案1

得分: 1

这是您要翻译的内容:

"Is this what you're looking for?

如果您的dplyr版本早于1.1,则使用以下代码:

  1. distribution %>%
  2. group_by(LOCATION) %>%
  3. mutate(percent = Freq/sum(Freq))
英文:

Is this what you're looking for?

  1. distribution %&gt;%
  2. mutate(percent = Freq/sum(Freq), .by = LOCATION)
  3. # LOCATION TOPIC RESP Freq percent
  4. # 1 LOC_A Dogs Agree 3 0.37500000
  5. # 2 LOC_A Dogs Disagree 3 0.37500000
  6. # 3 LOC_A Dogs Neither 0 0.00000000
  7. # 4 LOC_A Lizards Agree 1 0.12500000
  8. # 5 LOC_A Lizards Disagree 1 0.12500000
  9. # 6 LOC_A Lizards Neither 0 0.00000000
  10. # 7 LOC_A Snakes Agree 0 0.00000000
  11. # 8 LOC_A Snakes Disagree 0 0.00000000
  12. # 9 LOC_A Snakes Neither 0 0.00000000
  13. # 10 LOC_B Dogs Agree 0 0.00000000
  14. # 11 LOC_B Dogs Disagree 0 0.00000000
  15. # 12 LOC_B Dogs Neither 0 0.00000000
  16. # 13 LOC_B Lizards Agree 2 0.40000000
  17. # 14 LOC_B Lizards Disagree 2 0.40000000
  18. # 15 LOC_B Lizards Neither 1 0.20000000
  19. # 16 LOC_B Snakes Agree 0 0.00000000
  20. # 17 LOC_B Snakes Disagree 0 0.00000000
  21. # 18 LOC_B Snakes Neither 0 0.00000000
  22. # 19 LOC_C Dogs Agree 3 0.17647059
  23. # 20 LOC_C Dogs Disagree 1 0.05882353
  24. # 21 LOC_C Dogs Neither 0 0.00000000
  25. # 22 LOC_C Lizards Agree 2 0.11764706
  26. # 23 LOC_C Lizards Disagree 0 0.00000000
  27. # 24 LOC_C Lizards Neither 1 0.05882353
  28. # 25 LOC_C Snakes Agree 3 0.17647059
  29. # 26 LOC_C Snakes Disagree 1 0.05882353
  30. # 27 LOC_C Snakes Neither 6 0.35294118

If you have dplyr older than 1.1, then use

  1. distribution %&gt;%
  2. group_by(LOCATION) %&gt;%
  3. mutate(percent = Freq/sum(Freq))

答案2

得分: 1

The key is not to use summarise but mutate.

out <- distribution %>%
ungroup() %>%
group_by(LOCATION) %>%
mutate(percent = Freq/ sum(Freq))

英文:

The key is not to use summarise but mutate.

  1. out &lt;- distribution %&gt;%
  2. ungroup() %&gt;%
  3. group_by(LOCATION) %&gt;%
  4. mutate(percent = Freq/ sum(Freq))

huangapple
  • 本文由 发表于 2023年8月11日 01:46:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76878179.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定