英文:
Dynamically select multiple columns whose names are stored as variables
问题
我想要一个函数,能够接受一个 tibble 和一个指示该 tibble 中的变量数量的列名的字符向量,并执行一些操作,如 group_by。
这是一个示例,它可以处理0、1或2列:
library(tidyverse)
ex = crossing(abc=LETTERS[1:3], xyz=LETTERS[24:26]) %>%
mutate(n = row_number())
group_flexibly = function(tbl, group_by_cols=character(0)) {
if (length(group_by_cols)==0) {
tbl %>%
summarize(.groups='keep', mean_n = mean(n))
} else if (length(group_by_cols)==1) {
tbl %>%
group_by(!!as.name(group_by_cols[1])) %>%
summarize(.groups='keep', mean_n=mean(n))
} else if (length(group_by_cols)==2) {
tbl %>%
group_by(!!as.name(group_by_cols[1]), !!as.name(group_by_cols[2])) %>%
summarize(.groups='keep', mean_n=mean(n))
}
}
group_flexibly(ex)
group_flexibly(ex, 'abc')
group_flexibly(ex, 'xyz')
group_flexibly(ex, c('abc','xyz'))
输出如下所示:
> group_flexibly(ex)
# A tibble: 1 × 1
mean_n
<dbl>
1 5
> group_flexibly(ex, 'abc')
# A tibble: 3 × 2
# Groups: abc [3]
abc mean_n
<chr> <dbl>
1 A 2
2 B 5
3 C 8
> group_flexibly(ex, 'xyz')
# A tibble: 3 × 2
# Groups: xyz [3]
xyz mean_n
<chr> <dbl>
1 X 4
2 Y 5
3 Z 6
> group_flexibly(ex, c('abc','xyz'))
# A tibble: 9 × 3
# Groups: abc, xyz [9]
abc xyz mean_n
<chr> <chr> <dbl>
1 A X 1
2 A Y 2
3 A Z 3
4 B X 4
5 B Y 5
6 B Z 6
7 C X 7
8 C Y 8
9 C Z 9
到目前为止一切顺利。现在,如何编写一个可以处理任意长度字符向量的函数?
以下是两种不起作用的方法:
group_by_cols = c('abc','xyz')
ex %>% group_by(!!as.name(group_by_cols)) %>% summarize(.groups='keep', mean_n=mean(n))
ex %>% group_by({{group_by_cols}}) %>% summarize(.groups='keep', mean_n=mean(n))
到目前为止遇到的问题:
!!as.name(group_by_cols)
只使用group_by_cols[1]
并忽略向量的其余部分。{{group_by_cols}}
如果length(group_by_cols) != 1
,会引发错误。- 流行的 StackOverflow 讨论,如 这个,没有解决可变列名向量长度的需求。
英文:
I would like a function to be able to accept a tibble and a character vector indicating the column names of a variable number of columns in that tibble, and perform some operations such as group_by on it.
Here is an example that does it for 0, 1, or 2 columns:
library(tidyverse)
ex = crossing(abc=LETTERS[1:3], xyz=LETTERS[24:26]) %>% mutate(n = row_number())
group_flexibly = function(tbl, group_by_cols=character(0)) {
if (length(group_by_cols)==0) {
tbl %>%
summarize(.groups='keep', mean_n = mean(n))
} else if (length(group_by_cols)==1) {
tbl %>%
group_by(!!as.name(group_by_cols[1])) %>%
summarize(.groups='keep', mean_n=mean(n))
} else if (length(group_by_cols)==2) {
tbl %>%
group_by(!!as.name(group_by_cols[1]), !!as.name(group_by_cols[2])) %>%
summarize(.groups='keep', mean_n=mean(n))
}
}
group_flexibly(ex)
group_flexibly(ex, 'abc')
group_flexibly(ex, 'xyz')
group_flexibly(ex, c('abc','xyz'))
Output is as desired:
> group_flexibly(ex)
# A tibble: 1 × 1
mean_n
<dbl>
1 5
> group_flexibly(ex, 'abc')
# A tibble: 3 × 2
# Groups: abc [3]
abc mean_n
<chr> <dbl>
1 A 2
2 B 5
3 C 8
> group_flexibly(ex, 'xyz')
# A tibble: 3 × 2
# Groups: xyz [3]
xyz mean_n
<chr> <dbl>
1 X 4
2 Y 5
3 Z 6
> group_flexibly(ex, c('abc','xyz'))
# A tibble: 9 × 3
# Groups: abc, xyz [9]
abc xyz mean_n
<chr> <chr> <dbl>
1 A X 1
2 A Y 2
3 A Z 3
4 B X 4
5 B Y 5
6 B Z 6
7 C X 7
8 C Y 8
9 C Z 9
So far so good. Now, how to write such a function that does this for a character vector of arbitrary length?
Here are two things that do not work:
group_by_cols = c('abc','xyz')
ex %>% group_by(!!as.name(group_by_cols)) %>% summarize(.groups='keep', mean_n=mean(n))
ex %>% group_by({{group_by_cols}}) %>% summarize(.groups='keep', mean_n=mean(n))
Problems encountered so far:
!!as.name(group_by_cols)
only usesgroup_by_cols[1]
and ignores the rest of the vector.{{group_by_cols}}
throws an error if length(group_by_cols) != 1.- Popular StackOverflow discussions such as this do not address a need for the length of the vector of column names to be variable.
答案1
得分: 3
你正在寻找 across()
和 all_of()
:
group_flexibly <- function(tbl, grp_cols = character(0)){
tbl %>%
group_by(across(all_of(grp_cols))) %>%
summarise(mean_n = mean(n), .groups = 'keep')
}
character(0)
的默认值处理了不提供任何值给 grp_cols
的情况。
实际上,我最近学到了一个稍微更受欢迎的版本,是使用 pick()
而不是 across()
,区别在于如果 grp_cols
是一个命名向量,它将使用这些名称创建新列。使用 pick(all_of(grp_cols))
或评论中建议的 .by
参数都会在命名向量上出错。
英文:
You're looking for across()
and all_of()
:
group_flexibly <- function(tbl,grp_cols = character(0)){
tbl |>
group_by(across(all_of(grp_cols))) |>
summarise(mean_n = mean(n),.groups = 'keep')
}
The default value of character(0)
handles the case of not providing any value to grp_cols
.
I actually recently learned that a somewhat preferred version is to use pick()
instead of across()
, the difference being that if grp_cols
is a named vector it will create new columns using those names. Using pick(all_of(grp_cols))
or the .by
argument suggested in a comment would both error on a named vector.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论