2023年7月10日 19:55:56go评论74阅读模式

英文:

How to use across in an anonymous function in R

问题

我在使用此脚本从 df 表中获取统计信息时没有任何问题：

library(dplyr)
library(purrr)

set.seed(123)

df <- tibble(
  a = runif(5),
  b = runif(5)
)

funs <- list(min, median, mean, max, sd)

sum_df1 <- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), .x, na.rm = TRUE)),
  .id = "statistic"
)

sum_df1

但是我使用的 across 方法已被弃用。所以我尝试了以下方法，但没有成功：

# 由于弃用
sum_df2 <- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), \(x) na.rm = TRUE)),
  .id = "statistic"
)

# 错误：只有布尔值
sum_df2

英文:

I had no problems in getting statistics from df tibble using this script:

library(dplyr)
library(purrr)

set.seed(123)

df &lt;- tibble(
  a = runif(5),
  b = runif(5)
)

funs &lt;- lst(min, median, mean, max, sd)

sum_df1 &lt;- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), .x, na.rm = TRUE)),
  .id = &quot;statistic&quot;
)

sum_df1

But the way I used across is deprecated. So I tried the following without success:

# Due to deprecation
sum_df2 &lt;- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), \(x) na.rm = TRUE)),
  .id = &quot;statistic&quot;
)

# Error: only Booleans
sum_df2

答案1

得分: 2

这里 "col" 指的是列，".x" 指的是函数：

sum_df2 <- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE))),
  .id = "statistic"
)

identical(sum_df2, sum_df1)
## [1] TRUE

或者我们可以反过来，其中 "f" 是函数，".x" 是列：

sum_df3 <- map_dfr(funs,
  \(f) summarize(df, across(where(is.numeric), ~ f(.x, na.rm = TRUE))),
  .id = "statistic"
)

identical(sum_df3, sum_df1)
## [1] TRUE

或者我们可以完全避免使用 "~"，使用这种方式，其中 "f" 是函数，"col" 是列：

sum_df4 <- map_dfr(funs,
  \(f) summarize(df, across(where(is.numeric), \(col) f(col, na.rm = TRUE))),
  .id = "statistic"
)

identical(sum_df4, sum_df1)
## [1] TRUE

顺便提一下，?map_dfr 表示它已经被取代。这意味着它没有被弃用，所以继续使用它是可以的，但是更推荐使用 bind_rows(map(...))。如果我们这样做，那么我们会像这样重新做 sum_df2（对于 sum_df3 和 sum_df4 也类似）：

sum_df5 <- map(funs,
     ~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE)))) |>
   bind_rows(.id = "statistic")

identical(sum_df5, sum_df1)
## [1] TRUE

英文:

Here col refers to the column and .x refers to the function:

sum_df2 &lt;- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE))),
  .id = &quot;statistic&quot;
)

identical(sum_df2, sum_df1)
## [1] TRUE

or we can do it the other way around where f is the function and .x is the column.

sum_df3 &lt;- map_dfr(funs,
  \(f) summarize(df, across(where(is.numeric), ~ f(.x, na.rm = TRUE))),
  .id = &quot;statistic&quot;
)

identical(sum_df3, sum_df1)
## [1] TRUE

or we could avoid using ~ entirely and use this where f is the function and col is the column

sum_df4 &lt;- map_dfr(funs,
  \(f) summarize(df, across(where(is.numeric), \(col) f(col, na.rm = TRUE))),
  .id = &quot;statistic&quot;
)

identical(sum_df4, sum_df1)
## [1] TRUE

As an aside ?map_dfr indicates that it has been superseded. That means it is not deprecated so it is ok to continue to use it but bind_rows(map(...)) is preferred. If we were to do that then we would redo sum_df2 like this (and analogously for sum_df3 and sum_df4):

sum_df5 &lt;- map(funs,
     ~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE)))) |&gt;
   bind_rows(.id = &quot;statistic&quot;)

identical(sum_df5, sum_df1)
## [1] TRUE

答案2

得分: 1

你可以使用 purrr::partial 来添加/填充额外的参数，例如 na.rm = TRUE，例如：

sum_df1 <- map_dfr(funs,
                   ~ summarize(df, across(where(is.numeric), partial(.x, na.rm = TRUE))),
                   .id = "statistic"
)

# 一个 tibble: 5 × 3
  statistic     a      b
  <chr>     <dbl>  <dbl>
1 min       0.288 0.0456
2 median    0.788 0.528 
3 mean      0.662 0.495 
4 max       0.940 0.892 
5 sd        0.294 0.302

英文:

You can use purrr::partial to add/fill additional argument like na.rm = TRUE, e.g.

sum_df1 &lt;- map_dfr(funs,
                   ~ summarize(df, across(where(is.numeric), partial(.x, na.rm = TRUE))),
                   .id = &quot;statistic&quot;
)


    # A tibble: 5 &#215; 3
  statistic     a      b
  &lt;chr&gt;     &lt;dbl&gt;  &lt;dbl&gt;
1 min       0.288 0.0456
2 median    0.788 0.528 
3 mean      0.662 0.495 
4 max       0.940 0.892 
5 sd        0.294 0.302

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中如何在匿名函数中使用”across”

问题

答案1

答案2

For each post of a user when the last sponsored post happened (days)

从一行中提取数据，创建一个新的列，每个ID对应一个列。

如何手动计算方差膨胀因子（VIF）？

如何在R中使用Highchart绘制每个国家的多个数值在世界地图上。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论