英文:
How to write anonymous functions in R arrow across
问题
我已经通过arrow
包的open_dataset
函数打开了一个.parquet
数据集。我想要使用across
来同时清理多个数值列。然而,当我运行这段代码时:
start_numeric_cols = "sum"
sales <- sales %>% mutate(
across(starts_with(start_numeric_cols) & (!where(is.numeric)),
\(col) {replace(col, col == "NULL", 0) %>% as.numeric()}),
across(starts_with(start_numeric_cols) & (where(is.numeric)),
\(col) {replace(col, is.na(col), 0)})
)
#> Error in `across_setup()`:
#> ! Anonymous functions are not yet supported in Arrow
错误信息非常有用,但我想知道是否有一种方法只使用across
内的dplyr
动词来完成相同的操作(或者其他方法,而不必输入每个列名)。
英文:
I have opened a .parquet dataset through the open_dataset
function of the arrow
package. I want to use across
to clean several numeric columns at a time. However, when I run this code:
start_numeric_cols = "sum"
sales <- sales %>% mutate(
across(starts_with(start_numeric_cols) & (!where(is.numeric)),
\(col) {replace(col, col == "NULL", 0) %>% as.numeric()}),
across(starts_with(start_numeric_cols) & (where(is.numeric)),
\(col) {replace(col, is.na(col), 0)})
)
#> Error in `across_setup()`:
#> ! Anonymous functions are not yet supported in Arrow
The error message is pretty informative, but I am wondering whether there is any way to do the same only with dplyr
verbs within across
(or another workaround without having to type each column name).
答案1
得分: 3
arrow
具有越来越多的可在R中使用的功能,而不需要将数据导入R(在此处可用),但目前不支持replace()
。但是,您可以使用ifelse()
/if_else()
/case_when()
。还请注意,支持purrr风格的lambda函数,而不支持常规匿名函数。
我没有您的数据,所以将使用iris
数据集作为示例来演示查询成功构建,即使在这个数据的上下文中并没有完全意义。
library(arrow)
library(dplyr)
start_numeric_cols <- "P"
iris %>%
as_arrow_table() %>%
mutate(
across(
starts_with(start_numeric_cols) & (!where(is.numeric)),
~ as.numeric(if_else(.x == "NULL", 0, .x))
),
across(
starts_with(start_numeric_cols) & (where(is.numeric)),
~ if_else(is.na(.x), 0, .x)
)
)
查询结果如下:
Table (query)
Sepal.Length: double
Sepal.Width: double
Petal.Length: double (if_else(is_null(Petal.Length, {nan_is_null=true}), 0, Petal.Length))
Petal.Width: double (if_else(is_null(Petal.Width, {nan_is_null=true}), 0, Petal.Width))
Species: dictionary<values=string, indices=int8>
请查看$.data以获取源Arrow对象
[1]: https://arrow.apache.org/docs/r/reference/acero.html
英文:
arrow
has a growing set of functions that can be used without pulling the data into R (available here) but replace()
is not yet supported. However, you can use ifelse()
/if_else()
/case_when()
. Note also that purrr-style lambda functions are supported where regular anonymous functions are not.
I don't have your data so will use the iris
dataset as an example to demonstrate that the query builds successfully, even if it doesn't make complete sense in the context of this data.
library(arrow)
library(dplyr)
start_numeric_cols <- "P"
iris %>%
as_arrow_table() %>%
mutate(
across(
starts_with(start_numeric_cols) & (!where(is.numeric)),
~ as.numeric(if_else(.x == "NULL", 0, .x))
),
across(
starts_with(start_numeric_cols) & (where(is.numeric)),
~ if_else(is.na(.x), 0, .x)
)
)
Table (query)
Sepal.Length: double
Sepal.Width: double
Petal.Length: double (if_else(is_null(Petal.Length, {nan_is_null=true}), 0, Petal.Length))
Petal.Width: double (if_else(is_null(Petal.Width, {nan_is_null=true}), 0, Petal.Width))
Species: dictionary<values=string, indices=int8>
See $.data for the source Arrow object
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论