英文:
R function to summarise using dplyr group_by with flexibble groups, including no grouping at all
问题
我想编写一个R函数,使用dplyr来总结一个数据集,该函数接受不同数量的分组变量作为group_by语句的一部分,包括根本不分组。我找到了类似问题的答案,它们使用了'group_by_',但这已经被弃用(写作时的dplyr版本为1.1.2)。
我尝试过使用不同的方法将向group_by语句传递向量,试图使用整洁评估,但没有一个达到预期的效果,而且在不需要分组时无法返回答案。
以下是一个使用星球大战数据集的可重现示例的基础。该函数应能够返回各种生物的体重指数(BMI)的摘要表。
```r
`star_wars_BMI <- function(group_vec) {
df_out <- starwars %>%
mutate(BMI = height/mass^2) %>%
group_by(group_vec) %>%
summarise(height_mean = mean(height, na.rm = T),
mass_mean = mean(mass, na.rm = T),
BMI_mean = mean(BMI, na.rm = T))
return(df_out)
}
group_vector0 <- c() # 即整个星系的摘要
group_vector1 <- c("homeworld") # 按故乡星球总结
group_vector2 <- c("homeworld", "species") # 在每个故乡星球上按物种总结
galaxy_BMI <- star_wars_BMI(group_vec = group_vector0)
homeworld_BMI <- star_wars_BMI(group_vec = group_vector1)
`
我知道为无组或某些组单独编写函数是一个相对简单的任务,但我想看看是否可能只使用一个函数来完成这个任务。
关于整洁评估原理的解释将非常感激,如果能提供一个示例来继续绘制摘要,那将更好。
<details>
<summary>英文:</summary>
I want to write an R function using dplyr to summarise a data set that accepts different numbers of grouping variables to the group_by statement - including no grouping at all. I have found answers to similar questions that use 'group_by_', but this has been deprecated (dplyr vrsion at time of writing is 1.1.2).
I have used different methods of passing vectors to the group_by statements attempting to use tidy evaluation, but none have worked as expected and failed to return an answer when no grouping is required.
Here's the basis for a reproduceable example using the starwars dataset. The function should be capable of returning summary tables of the Body-Mass Indexes (BMI) of the various creatures.
`star_wars_BMI <- function(group_vec) {
df_out <- starwars %>%
mutate (BMI = height/mass^2) %>%
group_by(group_vec) %>%
summarise(height_mean = mean(height, na.rm = T),
mass_mean = mean(mass, na.rm = T),
BMI_mean = mean(BMI, na.rm = T))
return(df_out)
}
group_vector0 <- c() # ie. summarise for the whole galaxy
group_vector1 <- c("homeworld") # summarise by homeworld planet
group_vector2 <- c("homeworld", "species") = summarise by species on each homeworld
galaxy_BMI <- star_wars_BMI(group_vec = group_vector0)
homeworld_BMI <- star_wars_BMI(group_vec = group_vector1)
`
I know it's a relatively simple task to produce separate functions for either no or some groups, but I would like to see if it is possible to do this with just one.
An explanation of the tidy evalation rationale would be very much appreciated - as would an example that went on to plot the summaries.
</details>
# 答案1
**得分**: 3
这是另一种选项,使用省略号或 `...` 作为传递给 `group_by` 函数的参数列名。现在我们传递的不是向量,而是列名:
`rlang::ensyms(...)` 将列名存储为符号,然后 `!!!` 在 `group_by` 函数中取消引用它们:
```R
library(dplyr)
star_wars_BMI <- function(...) {
group_vec <- rlang::ensyms(...)
df_out <- starwars %>%
mutate (BMI = height/mass^2) %>%
group_by(!!!group_vec) %>%
summarise(height_mean = mean(height, na.rm = TRUE),
mass_mean = mean(mass, na.rm = TRUE),
BMI_mean = mean(BMI, na.rm = TRUE))
return(df_out)
}
star_wars_BMI()
结果输出:
height_mean mass_mean BMI_mean
<dbl> <dbl> <dbl>
1 174. 97.3 0.0481
star_wars_BMI("homeworld")
结果输出:
# A tibble: 49 × 4
homeworld height_mean mass_mean BMI_mean
<chr> <dbl> <dbl> <dbl>
1 Alderaan 176. 64 0.0463
2 Aleen Minor 79 15 0.351
3 Bespin 175 79 0.0280
4 Bestine IV 180 110 0.0149
5 Cato Neimoidia 191 90 0.0236
6 Cerea 198 82 0.0294
7 Champala 196 NaN NaN
8 Chandrila 150 NaN NaN
9 Concord Dawn 183 79 0.0293
10 Corellia 175 78.5 0.0284
# ... with 39 more rows
# ℹ Use `print(n = ...)` to see more rows
star_wars_BMI("homeworld", "species")
结果输出:
`summarise()` has grouped output by 'homeworld'. You can override using the
`.groups` argument.
# A tibble: 58 × 5
# Groups: homeworld [49]
homeworld species height_mean mass_mean BMI_mean
<chr> <chr> <dbl> <dbl> <dbl>
1 Alderaan Human 176. 64 0.0463
2 Aleen Minor Aleena 79 15 0.351
3 Bespin Human 175 79 0.0280
4 Bestine IV Human 180 110 0.0149
5 Cato Neimoidia Neimodian 191 90 0.0236
6 Cerea Cerean 198 82 0.0294
7 Champala Chagrian 196 NaN NaN
8 Chandrila Human 150 NaN NaN
9 Concord Dawn Human 183 79 0.0293
10 Corellia Human 175 78.5 0.0284
# ... with 48 more rows
# ℹ Use `print(n = ...)` to see more rows
英文:
Here is another option using the ellipsis or ...
as argument to column names for group_by. Now we pass not a vector but the column names instead:
The rlang::ensyms(...) stores the column names as symbols, then
!!!` unquotes them in the group_by function:
library(dplyr)
star_wars_BMI <- function(...) {
group_vec <- rlang::ensyms(...)
df_out <- starwars %>%
mutate (BMI = height/mass^2) %>%
group_by(!!!group_vec) %>%
summarise(height_mean = mean(height, na.rm = TRUE),
mass_mean = mean(mass, na.rm = TRUE),
BMI_mean = mean(BMI, na.rm = TRUE))
return(df_out)
}
star_wars_BMI()
star_wars_BMI("homeworld")
star_wars_BMI("homeworld", "species")
output:
height_mean mass_mean BMI_mean
<dbl> <dbl> <dbl>
1 174. 97.3 0.0481
> star_wars_BMI("homeworld")
# A tibble: 49 × 4
homeworld height_mean mass_mean BMI_mean
<chr> <dbl> <dbl> <dbl>
1 Alderaan 176. 64 0.0463
2 Aleen Minor 79 15 0.351
3 Bespin 175 79 0.0280
4 Bestine IV 180 110 0.0149
5 Cato Neimoidia 191 90 0.0236
6 Cerea 198 82 0.0294
7 Champala 196 NaN NaN
8 Chandrila 150 NaN NaN
9 Concord Dawn 183 79 0.0293
10 Corellia 175 78.5 0.0284
# … with 39 more rows
# ℹ Use `print(n = ...)` to see more rows
> star_wars_BMI("homeworld", "species")
`summarise()` has grouped output by 'homeworld'. You can override using the
`.groups` argument.
# A tibble: 58 × 5
# Groups: homeworld [49]
homeworld species height_mean mass_mean BMI_mean
<chr> <chr> <dbl> <dbl> <dbl>
1 Alderaan Human 176. 64 0.0463
2 Aleen Minor Aleena 79 15 0.351
3 Bespin Human 175 79 0.0280
4 Bestine IV Human 180 110 0.0149
5 Cato Neimoidia Neimodian 191 90 0.0236
6 Cerea Cerean 198 82 0.0294
7 Champala Chagrian 196 NaN NaN
8 Chandrila Human 150 NaN NaN
9 Concord Dawn Human 183 79 0.0293
10 Corellia Human 175 78.5 0.0284
# … with 48 more rows
# ℹ Use `print(n = ...)` to see more rows
>
答案2
得分: 2
希望你一切都好。
我相信你可以使用across
。
例如:
star_wars_BMI <- function(group_vec) {
df_out <- starwars %>%
mutate(BMI = height/mass^2) %>%
group_by(across(group_vec)) %>%
summarise(height_mean = mean(height, na.rm = T),
mass_mean = mean(mass, na.rm = T),
BMI_mean = mean(BMI, na.rm = T))
return(df_out)
}
group_vector0 <- c() # 即整个星系的总结
group_vector1 <- c("homeworld") # 按母星总结
group_vector2 <- c("homeworld", "species") # 在每个母星上按物种总结
star_wars_BMI(group_vec = group_vector0)
star_wars_BMI(group_vec = group_vector1)
star_wars_BMI(group_vec = group_vector2)
英文:
hope you are doing well
I believe you can use the across
Like:
star_wars_BMI <- function(group_vec) {
df_out <- starwars %>%
mutate (BMI = height/mass^2) %>%
group_by(across(group_vec)) %>%
summarise(height_mean = mean(height, na.rm = T),
mass_mean = mean(mass, na.rm = T),
BMI_mean = mean(BMI, na.rm = T))
return(df_out)
}
group_vector0 <- c() # ie. summarise for the whole galaxy
group_vector1 <- c("homeworld") # summarise by homeworld planet
group_vector2 <- c("homeworld", "species") # summarise by species on each homeworld
star_wars_BMI(group_vec = group_vector0)
star_wars_BMI(group_vec = group_vector1)
star_wars_BMI(group_vec = group_vector2)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论