英文:
Within dplyr::group_by, obtain the number of observations for ONE of multiple grouping variables
问题
以下是您要翻译的部分:
"It's very possible this has been asked before, however I am having a very difficult time articulating my problem.
Within my data, I have 3 variables, LOCATION, TOPIC, and RESPONSE. I would like to calculate the distribution for each combination of TOPIC and RESPONSE by LOCATION.
Create toy data and perform initial data prep
responses <- data.frame(LOCATION = c("LOC_A", "LOC_A", "LOC_A", "LOC_A", "LOC_A",
"LOC_A", "LOC_A", "LOC_A",
"LOC_B", "LOC_B", "LOC_B", "LOC_B", "LOC_B",
"LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C",
"LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C",
"LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C"),
TOPIC = c("Dogs", "Dogs", "Dogs", "Dogs", "Dogs", "Dogs",
"Lizards", "Lizards", "Lizards",
"Lizards", "Lizards", "Lizards", "Lizards", "Lizards",
"Lizards", "Lizards", "Snakes", "Snakes", "Snakes", "Snakes", "Snakes",
"Snakes", "Dogs", "Snakes", "Dogs", "Snakes", "Dogs",
"Snakes", "Dogs", "Snakes"),
RESP = c("Agree", "Disagree", "Agree", "Disagree", "Agree",
"Disagree", "Agree", "Disagree",
"Agree", "Disagree", "Agree", "Disagree", "Neither", "Agree",
"Neither", "Agree", "Neither", "Agree", "Neither",
"Agree", "Neither", "Agree", "Agree", "Neither",
"Agree", "Neither", "Agree", "Disagree", "Disagree",
"Neither"))
获取每个组合级别的计数
distribution <- responses %>%
table() %>%
as.data.frame() %>%
使其更易读
dplyr::arrange(LOCATION, TOPIC, RESP)
以下是一个使用循环创建所需输出的示例解决方案:
# 丑陋的循环解决方案 :(
# 初始化输出容器
out <- list()
# 遍历每个位置
for(loc in unique(distribution$LOCATION)){
# 子集该位置的分布
thisDist <- dplyr::filter(distribution, LOCATION == loc)
# 计算该位置的每个响应的百分比
thisDist$percent <- thisDist$Freq/sum(thisDist$Freq)
# 存储带有百分比列的分布 df
out[[loc]] <- thisDist
}
# 将输出组合成单个 df
out <- do.call("rbind", out)
我想要的是一个简洁的tidyverse解决方案。以下是描述我想象中的解决方案的伪代码:
# 想象中的tidyverse解决方案 :)
out <- distribution %>%
group_by(LOCATION, TOPIC, RESP) %>%
summarise(#percent = Freq/(sum(<all-Freq-values-for-this-group's-LOCATION-value>))
)
我在这里想要做的是获取当前组的LOCATION值的所有Freq值的总和。是否有一种在group_by/summarise内部实现这一点的好方法?
感谢您的阅读,希望这不会完全令人费解。
英文:
It's very possible this has been asked before, however I am having a very difficult time articulating my problem.
Within my data, I have 3 variables, LOCATION, TOPIC, and RESPONSE. I would like to calculate the distribution for each combination of TOPIC and RESPONSE by LOCATION.
Create toy data and perform initial data prep
responses <- data.frame(LOCATION = c("LOC_A", "LOC_A", "LOC_A", "LOC_A", "LOC_A",
"LOC_A", "LOC_A", "LOC_A",
"LOC_B", "LOC_B", "LOC_B", "LOC_B", "LOC_B",
"LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C",
"LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C",
"LOC_C", "LOC_C", "LOC_C", "LOC_C", "LOC_C"),
TOPIC = c("Dogs", "Dogs", "Dogs", "Dogs", "Dogs", "Dogs",
"Lizards", "Lizards", "Lizards",
"Lizards", "Lizards", "Lizards", "Lizards", "Lizards",
"Lizards", "Lizards", "Snakes", "Snakes", "Snakes", "Snakes", "Snakes",
"Snakes", "Dogs", "Snakes", "Dogs", "Snakes", "Dogs",
"Snakes", "Dogs", "Snakes"),
RESP = c("Agree", "Disagree", "Agree", "Disagree", "Agree",
"Disagree", "Agree", "Disagree",
"Agree", "Disagree", "Agree", "Disagree", "Neither", "Agree",
"Neither", "Agree", "Neither", "Agree", "Neither",
"Agree", "Neither", "Agree", "Agree", "Neither",
"Agree", "Neither", "Agree", "Disagree", "Disagree",
"Neither"))
# Obtain counts for each combination of levels
distribution <- responses %>%
table() %>%
as.data.frame() %>%
# Make it more readable
dplyr::arrange(LOCATION, TOPIC, RESP)
Here is an example solution which uses a loop to create my desired output:
# ugly loop solution :(
# Initialize output container
out <- list()
# Iterate over each location
for(loc in unique(distribution$LOCATION)){
# Subset distribution for this location
thisDist <- dplyr::filter(distribution, LOCATION == loc)
# Calculate percent of each response for this location
thisDist$percent <- thisDist$Freq/sum(thisDist$Freq)
# Store distribution df with percent column
out[[loc]] <- thisDist
}
# combine output into single df
out <- do.call("rbind", out)
What I would like to have is a concise tidyverse solution. Here is some pseudo-code which describes my imaginary solution.
# Imaginary tidyverse solution :)
out <- distribution %>%
group_by(LOCATION, TOPIC, RESP) %>%
summarise(#percent = Freq/(sum(<all-Freq-values-for-this-group's-LOCATION-value>))
)
What I'm looking to do here is obtain the sum of all Freq values for the LOCATION value of the current group. Is there a nice way to do this within a group_by/summarise?
Thanks for reading, I hope this isn't completely inscrutable.
答案1
得分: 1
这是您要翻译的内容:
"Is this what you're looking for?
如果您的dplyr
版本早于1.1,则使用以下代码:
distribution %>%
group_by(LOCATION) %>%
mutate(percent = Freq/sum(Freq))
英文:
Is this what you're looking for?
distribution %>%
mutate(percent = Freq/sum(Freq), .by = LOCATION)
# LOCATION TOPIC RESP Freq percent
# 1 LOC_A Dogs Agree 3 0.37500000
# 2 LOC_A Dogs Disagree 3 0.37500000
# 3 LOC_A Dogs Neither 0 0.00000000
# 4 LOC_A Lizards Agree 1 0.12500000
# 5 LOC_A Lizards Disagree 1 0.12500000
# 6 LOC_A Lizards Neither 0 0.00000000
# 7 LOC_A Snakes Agree 0 0.00000000
# 8 LOC_A Snakes Disagree 0 0.00000000
# 9 LOC_A Snakes Neither 0 0.00000000
# 10 LOC_B Dogs Agree 0 0.00000000
# 11 LOC_B Dogs Disagree 0 0.00000000
# 12 LOC_B Dogs Neither 0 0.00000000
# 13 LOC_B Lizards Agree 2 0.40000000
# 14 LOC_B Lizards Disagree 2 0.40000000
# 15 LOC_B Lizards Neither 1 0.20000000
# 16 LOC_B Snakes Agree 0 0.00000000
# 17 LOC_B Snakes Disagree 0 0.00000000
# 18 LOC_B Snakes Neither 0 0.00000000
# 19 LOC_C Dogs Agree 3 0.17647059
# 20 LOC_C Dogs Disagree 1 0.05882353
# 21 LOC_C Dogs Neither 0 0.00000000
# 22 LOC_C Lizards Agree 2 0.11764706
# 23 LOC_C Lizards Disagree 0 0.00000000
# 24 LOC_C Lizards Neither 1 0.05882353
# 25 LOC_C Snakes Agree 3 0.17647059
# 26 LOC_C Snakes Disagree 1 0.05882353
# 27 LOC_C Snakes Neither 6 0.35294118
If you have dplyr
older than 1.1, then use
distribution %>%
group_by(LOCATION) %>%
mutate(percent = Freq/sum(Freq))
答案2
得分: 1
The key is not to use summarise
but mutate
.
out <- distribution %>%
ungroup() %>%
group_by(LOCATION) %>%
mutate(percent = Freq/ sum(Freq))
英文:
The key is not to use summarise
but mutate
.
out <- distribution %>%
ungroup() %>%
group_by(LOCATION) %>%
mutate(percent = Freq/ sum(Freq))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论