如何在R中编写匿名函数箭头形式。

huangapple go评论56阅读模式
英文:

How to write anonymous functions in R arrow across

问题

我已经通过arrow包的open_dataset函数打开了一个.parquet数据集。我想要使用across来同时清理多个数值列。然而,当我运行这段代码时:

start_numeric_cols = "sum"
sales <- sales %>% mutate(
  across(starts_with(start_numeric_cols) & (!where(is.numeric)), 
         \(col) {replace(col, col == "NULL", 0) %>% as.numeric()}),
  across(starts_with(start_numeric_cols) & (where(is.numeric)),
         \(col) {replace(col, is.na(col), 0)})
)
#> Error in `across_setup()`:
#> ! Anonymous functions are not yet supported in Arrow

错误信息非常有用,但我想知道是否有一种方法只使用across内的dplyr动词来完成相同的操作(或者其他方法,而不必输入每个列名)。

英文:

I have opened a .parquet dataset through the open_dataset function of the arrow package. I want to use across to clean several numeric columns at a time. However, when I run this code:

start_numeric_cols = &quot;sum&quot;
sales &lt;- sales %&gt;% mutate(
  across(starts_with(start_numeric_cols) &amp; (!where(is.numeric)), 
         \(col) {replace(col, col == &quot;NULL&quot;, 0) %&gt;% as.numeric()}),
  across(starts_with(start_numeric_cols) &amp; (where(is.numeric)),
         \(col) {replace(col, is.na(col), 0)})
)
#&gt; Error in `across_setup()`:
#&gt; ! Anonymous functions are not yet supported in Arrow

The error message is pretty informative, but I am wondering whether there is any way to do the same only with dplyr verbs within across (or another workaround without having to type each column name).

答案1

得分: 3

arrow具有越来越多的可在R中使用的功能,而不需要将数据导入R(在此处可用),但目前不支持replace()。但是,您可以使用ifelse()/if_else()/case_when()。还请注意,支持purrr风格的lambda函数,而不支持常规匿名函数。

我没有您的数据,所以将使用iris数据集作为示例来演示查询成功构建,即使在这个数据的上下文中并没有完全意义。

library(arrow)
library(dplyr)

start_numeric_cols <- "P"

iris %>%
  as_arrow_table() %>%
  mutate(
    across(
      starts_with(start_numeric_cols) & (!where(is.numeric)),
      ~ as.numeric(if_else(.x == "NULL", 0, .x))
    ),
    across(
      starts_with(start_numeric_cols) & (where(is.numeric)),
      ~ if_else(is.na(.x), 0, .x)
    )
  )

查询结果如下:

Table (query)
Sepal.Length: double
Sepal.Width: double
Petal.Length: double (if_else(is_null(Petal.Length, {nan_is_null=true}), 0, Petal.Length))
Petal.Width: double (if_else(is_null(Petal.Width, {nan_is_null=true}), 0, Petal.Width))
Species: dictionary<values=string, indices=int8>

请查看$.data以获取源Arrow对象

[1]: https://arrow.apache.org/docs/r/reference/acero.html
英文:

arrow has a growing set of functions that can be used without pulling the data into R (available here) but replace() is not yet supported. However, you can use ifelse()/if_else()/case_when(). Note also that purrr-style lambda functions are supported where regular anonymous functions are not.

I don't have your data so will use the iris dataset as an example to demonstrate that the query builds successfully, even if it doesn't make complete sense in the context of this data.

library(arrow)
library(dplyr)

start_numeric_cols &lt;- &quot;P&quot;

iris %&gt;%
  as_arrow_table() %&gt;%
  mutate(
    across(
    starts_with(start_numeric_cols) &amp; (!where(is.numeric)),
    ~ as.numeric(if_else(.x == &quot;NULL&quot;, 0, .x))
  ),
  across(
    starts_with(start_numeric_cols) &amp; (where(is.numeric)),
    ~ if_else(is.na(.x), 0, .x)
  )
)

Table (query)
Sepal.Length: double
Sepal.Width: double
Petal.Length: double (if_else(is_null(Petal.Length, {nan_is_null=true}), 0, Petal.Length))
Petal.Width: double (if_else(is_null(Petal.Width, {nan_is_null=true}), 0, Petal.Width))
Species: dictionary&lt;values=string, indices=int8&gt;

See $.data for the source Arrow object

huangapple
  • 本文由 发表于 2023年4月10日 18:22:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75976247.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定