2023年8月11日 01:46:56go评论228阅读模式

英文:

Within dplyr::group_by, obtain the number of observations for ONE of multiple grouping variables

问题

以下是您要翻译的部分：

"It's very possible this has been asked before, however I am having a very difficult time articulating my problem.

Within my data, I have 3 variables, LOCATION, TOPIC, and RESPONSE. I would like to calculate the distribution for each combination of TOPIC and RESPONSE by LOCATION.

Create toy data and perform initial data prep

responses &lt;- data.frame(LOCATION = c(&quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, 
                                     &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, 
                                     &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, 
                                     &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, 
                                     &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, 
                                     &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;),
                        TOPIC = c(&quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, 
                                  &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;,
                                  &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, 
                                   &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, 
                                  &quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;, &quot;Dogs&quot;, 
                                  &quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;),
                        RESP = c(&quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, 
                                 &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, 
                                 &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Neither&quot;, &quot;Agree&quot;,
                                 &quot;Neither&quot;, &quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Neither&quot;, 
                                 &quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Agree&quot;, &quot;Neither&quot;, 
                                 &quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Disagree&quot;,
                                 &quot;Neither&quot;))

获取每个组合级别的计数

distribution <- responses %>%
table() %>%
as.data.frame() %>%

使其更易读

dplyr::arrange(LOCATION, TOPIC, RESP)

以下是一个使用循环创建所需输出的示例解决方案：

# 丑陋的循环解决方案 :(
# 初始化输出容器
out &lt;- list()
# 遍历每个位置
for(loc in unique(distribution$LOCATION)){
  # 子集该位置的分布
  thisDist &lt;- dplyr::filter(distribution, LOCATION == loc)
  
  # 计算该位置的每个响应的百分比
  thisDist$percent &lt;- thisDist$Freq/sum(thisDist$Freq)
  
  # 存储带有百分比列的分布 df
  out[[loc]] &lt;- thisDist
}
# 将输出组合成单个 df
out &lt;- do.call(&quot;rbind&quot;, out)

我想要的是一个简洁的tidyverse解决方案。以下是描述我想象中的解决方案的伪代码：

# 想象中的tidyverse解决方案 :)
out &lt;- distribution %&gt;% 
  group_by(LOCATION, TOPIC, RESP) %&gt;% 
  summarise(#percent = Freq/(sum(&lt;all-Freq-values-for-this-group&#39;s-LOCATION-value&gt;))
            )

我在这里想要做的是获取当前组的LOCATION值的所有Freq值的总和。是否有一种在group_by/summarise内部实现这一点的好方法？

感谢您的阅读，希望这不会完全令人费解。

英文:

It's very possible this has been asked before, however I am having a very difficult time articulating my problem.

Within my data, I have 3 variables, LOCATION, TOPIC, and RESPONSE. I would like to calculate the distribution for each combination of TOPIC and RESPONSE by LOCATION.

Create toy data and perform initial data prep

responses &lt;- data.frame(LOCATION = c(&quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, 
&quot;LOC_A&quot;, &quot;LOC_A&quot;, &quot;LOC_A&quot;, 
&quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, &quot;LOC_B&quot;, 
&quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, 
&quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, 
&quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;, &quot;LOC_C&quot;),
TOPIC = c(&quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, &quot;Dogs&quot;, 
&quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;,
&quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Lizards&quot;, 
&quot;Lizards&quot;, &quot;Lizards&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, &quot;Snakes&quot;, 
&quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;, &quot;Dogs&quot;, 
&quot;Snakes&quot;, &quot;Dogs&quot;, &quot;Snakes&quot;),
RESP = c(&quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, 
&quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, 
&quot;Agree&quot;, &quot;Disagree&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Neither&quot;, &quot;Agree&quot;,
&quot;Neither&quot;, &quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Neither&quot;, 
&quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Agree&quot;, &quot;Neither&quot;, 
&quot;Agree&quot;, &quot;Neither&quot;, &quot;Agree&quot;, &quot;Disagree&quot;, &quot;Disagree&quot;,
&quot;Neither&quot;))
# Obtain counts for each combination of levels
distribution &lt;- responses %&gt;% 
table() %&gt;% 
as.data.frame() %&gt;% 
# Make it more readable
dplyr::arrange(LOCATION, TOPIC, RESP)

Here is an example solution which uses a loop to create my desired output:

# ugly loop solution :(
# Initialize output container
out &lt;- list()
# Iterate over each location
for(loc in unique(distribution$LOCATION)){
# Subset distribution for this location
thisDist &lt;- dplyr::filter(distribution, LOCATION == loc)
# Calculate percent of each response for this location
thisDist$percent &lt;- thisDist$Freq/sum(thisDist$Freq)
# Store distribution df with percent column
out[[loc]] &lt;- thisDist
}
# combine output into single df
out &lt;- do.call(&quot;rbind&quot;, out)

What I would like to have is a concise tidyverse solution. Here is some pseudo-code which describes my imaginary solution.

# Imaginary tidyverse solution :)
out &lt;- distribution %&gt;% 
group_by(LOCATION, TOPIC, RESP) %&gt;% 
summarise(#percent = Freq/(sum(&lt;all-Freq-values-for-this-group&#39;s-LOCATION-value&gt;))
)

What I'm looking to do here is obtain the sum of all Freq values for the LOCATION value of the current group. Is there a nice way to do this within a group_by/summarise?

Thanks for reading, I hope this isn't completely inscrutable.

答案1

得分: 1

这是您要翻译的内容：

"Is this what you're looking for?

如果您的dplyr版本早于1.1，则使用以下代码：

distribution %>%
  group_by(LOCATION) %>%
  mutate(percent = Freq/sum(Freq))

英文:

Is this what you're looking for?

distribution %&gt;%
  mutate(percent = Freq/sum(Freq), .by = LOCATION)
#    LOCATION   TOPIC     RESP Freq    percent
# 1     LOC_A    Dogs    Agree    3 0.37500000
# 2     LOC_A    Dogs Disagree    3 0.37500000
# 3     LOC_A    Dogs  Neither    0 0.00000000
# 4     LOC_A Lizards    Agree    1 0.12500000
# 5     LOC_A Lizards Disagree    1 0.12500000
# 6     LOC_A Lizards  Neither    0 0.00000000
# 7     LOC_A  Snakes    Agree    0 0.00000000
# 8     LOC_A  Snakes Disagree    0 0.00000000
# 9     LOC_A  Snakes  Neither    0 0.00000000
# 10    LOC_B    Dogs    Agree    0 0.00000000
# 11    LOC_B    Dogs Disagree    0 0.00000000
# 12    LOC_B    Dogs  Neither    0 0.00000000
# 13    LOC_B Lizards    Agree    2 0.40000000
# 14    LOC_B Lizards Disagree    2 0.40000000
# 15    LOC_B Lizards  Neither    1 0.20000000
# 16    LOC_B  Snakes    Agree    0 0.00000000
# 17    LOC_B  Snakes Disagree    0 0.00000000
# 18    LOC_B  Snakes  Neither    0 0.00000000
# 19    LOC_C    Dogs    Agree    3 0.17647059
# 20    LOC_C    Dogs Disagree    1 0.05882353
# 21    LOC_C    Dogs  Neither    0 0.00000000
# 22    LOC_C Lizards    Agree    2 0.11764706
# 23    LOC_C Lizards Disagree    0 0.00000000
# 24    LOC_C Lizards  Neither    1 0.05882353
# 25    LOC_C  Snakes    Agree    3 0.17647059
# 26    LOC_C  Snakes Disagree    1 0.05882353
# 27    LOC_C  Snakes  Neither    6 0.35294118

If you have dplyr older than 1.1, then use

distribution %&gt;%
  group_by(LOCATION) %&gt;%
  mutate(percent = Freq/sum(Freq))

答案2

得分: 1

The key is not to use summarise but mutate.

out <- distribution %>%
ungroup() %>%
group_by(LOCATION) %>%
mutate(percent = Freq/ sum(Freq))

英文:

The key is not to use summarise but mutate.

out &lt;- distribution %&gt;% 
ungroup() %&gt;% 
group_by(LOCATION) %&gt;% 
mutate(percent = Freq/ sum(Freq))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在dplyr::group_by中，获取一个或多个分组变量中的观察数量。

问题

Create toy data and perform initial data prep

获取每个组合级别的计数

使其更易读

Create toy data and perform initial data prep

答案1

答案2

如何在RMarkdown中为文中引用和图表超链接使用不同的颜色？

统计唯一活动的数量

Nullmodel with presence absence data in vegan – R

如何在列范围下联合更改值，并在其他列中分别更改。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论