英文:
How to use across in an anonymous function in R
问题
我在使用此脚本从 df
表中获取统计信息时没有任何问题:
library(dplyr)
library(purrr)
set.seed(123)
df <- tibble(
a = runif(5),
b = runif(5)
)
funs <- list(min, median, mean, max, sd)
sum_df1 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), .x, na.rm = TRUE)),
.id = "statistic"
)
sum_df1
但是我使用的 across
方法已被弃用。所以我尝试了以下方法,但没有成功:
# 由于弃用
sum_df2 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), \(x) na.rm = TRUE)),
.id = "statistic"
)
# 错误:只有布尔值
sum_df2
英文:
I had no problems in getting statistics from df
tibble using this script:
library(dplyr)
library(purrr)
set.seed(123)
df <- tibble(
a = runif(5),
b = runif(5)
)
funs <- lst(min, median, mean, max, sd)
sum_df1 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), .x, na.rm = TRUE)),
.id = "statistic"
)
sum_df1
But the way I used across
is deprecated. So I tried the following without success:
# Due to deprecation
sum_df2 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), \(x) na.rm = TRUE)),
.id = "statistic"
)
# Error: only Booleans
sum_df2
答案1
得分: 2
这里 "col" 指的是列,".x" 指的是函数:
sum_df2 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE))),
.id = "statistic"
)
identical(sum_df2, sum_df1)
## [1] TRUE
或者我们可以反过来,其中 "f" 是函数,".x" 是列:
sum_df3 <- map_dfr(funs,
\(f) summarize(df, across(where(is.numeric), ~ f(.x, na.rm = TRUE))),
.id = "statistic"
)
identical(sum_df3, sum_df1)
## [1] TRUE
或者我们可以完全避免使用 "~",使用这种方式,其中 "f" 是函数,"col" 是列:
sum_df4 <- map_dfr(funs,
\(f) summarize(df, across(where(is.numeric), \(col) f(col, na.rm = TRUE))),
.id = "statistic"
)
identical(sum_df4, sum_df1)
## [1] TRUE
顺便提一下,?map_dfr
表示它已经被取代。这意味着它没有被弃用,所以继续使用它是可以的,但是更推荐使用 bind_rows(map(...))
。如果我们这样做,那么我们会像这样重新做 sum_df2
(对于 sum_df3
和 sum_df4
也类似):
sum_df5 <- map(funs,
~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE)))) |>
bind_rows(.id = "statistic")
identical(sum_df5, sum_df1)
## [1] TRUE
英文:
Here col refers to the column and .x refers to the function:
sum_df2 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE))),
.id = "statistic"
)
identical(sum_df2, sum_df1)
## [1] TRUE
or we can do it the other way around where f is the function and .x is the column.
sum_df3 <- map_dfr(funs,
\(f) summarize(df, across(where(is.numeric), ~ f(.x, na.rm = TRUE))),
.id = "statistic"
)
identical(sum_df3, sum_df1)
## [1] TRUE
or we could avoid using ~ entirely and use this where f is the function and col is the column
sum_df4 <- map_dfr(funs,
\(f) summarize(df, across(where(is.numeric), \(col) f(col, na.rm = TRUE))),
.id = "statistic"
)
identical(sum_df4, sum_df1)
## [1] TRUE
As an aside ?map_dfr
indicates that it has been superseded. That means it is not deprecated so it is ok to continue to use it but bind_rows(map(...))
is preferred. If we were to do that then we would redo sum_df2
like this (and analogously for sum_df3
and sum_df4
):
sum_df5 <- map(funs,
~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE)))) |>
bind_rows(.id = "statistic")
identical(sum_df5, sum_df1)
## [1] TRUE
答案2
得分: 1
你可以使用 purrr::partial
来添加/填充额外的参数,例如 na.rm = TRUE
,例如:
sum_df1 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), partial(.x, na.rm = TRUE))),
.id = "statistic"
)
# 一个 tibble: 5 × 3
statistic a b
<chr> <dbl> <dbl>
1 min 0.288 0.0456
2 median 0.788 0.528
3 mean 0.662 0.495
4 max 0.940 0.892
5 sd 0.294 0.302
英文:
You can use purrr::partial
to add/fill additional argument like na.rm = TRUE
, e.g.
sum_df1 <- map_dfr(funs,
~ summarize(df, across(where(is.numeric), partial(.x, na.rm = TRUE))),
.id = "statistic"
)
# A tibble: 5 × 3
statistic a b
<chr> <dbl> <dbl>
1 min 0.288 0.0456
2 median 0.788 0.528
3 mean 0.662 0.495
4 max 0.940 0.892
5 sd 0.294 0.302
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论