英文:
Passing multiple dataframes into function to produce dataframe using map_dfr
问题
我有一个函数,我正在传递三个变量(列名)和一个数据框,并将结果放入一个数据框中,使用map_dfr
工作得很好。
library(tidyverse)
iris_median <- function(df, col){
df %>%
summarise(Median = median(.data[[col]], na.rm = TRUE)) %>%
mutate(field = as.character(!!col))
}
median_df <- map_dfr(c("Sepal.Length", "Sepal.Width", "Petal.Length"), .f = iris_median, df = iris)
当我尝试像下面这样将多个数据框传递到这个函数中时:
iris1 <- iris[1:50,]
iris2 <- iris[51:100,]
median_df <- map_dfr("Sepal.Length", .f = iris_median, df = c(iris1, iris2))
我收到错误消息:no applicable method for 'summarise' applied to an object of class "list"
我不明白为什么它不只是迭代两个不同的数据框并对每个数据框执行计算。
有人能修改一下代码吗?
英文:
I have a function which I am passing three variables (column names) and a dataframe and putting the results into a dataframe with map_dfr
which works fine.
library(tidyverse)
iris_median<-function(df, col){
df%>%
summarise(Median=median(.data[[col]], na.rm=TRUE))%>%
mutate(field=as.character(!!col))
}
median_df <- map_dfr(c("Sepal.Length","Sepal.Width",
"Petal.Length"), .f=iris_median,
df= iris)
When I try to pass multiple dataframes into this function as below:
iris1 <- iris [1:50,]
iris2 <- iris [51:100,]
median_df <- map_dfr("Sepal.Length", .f=iris_median,
df= c(iris1, iris2))
I get the error message: no applicable method for 'summarise' applied to an object of class "list"
I don't understand why it isn't just iterating through the two different dataframes and doing the calculation for each dataframe.
Can anyone ammend the code?
答案1
得分: 4
map_dfr
通过第一个参数(.x
参数)迭代其中的项目。它不会迭代通过 ...
传递的额外参数。
由于 map_
函数只是循环的包装器,你的调用
map_dfr("Sepal.Length", .f = iris_median, df = c(iris1, iris2))
调用 iris_median
函数一次(因为第一个参数的长度为1),并立即将整个列表传递给 df
参数。这等效于
iris_median(df = c(iris1, iris2), col = "Sepal.Length")
而正是这个调用导致了你的错误,因为你试图对一个 list
进行 summarize
。
要让 map_dfr
在数据框列表上工作,你需要将列表作为第一个(.x
)参数传递,并将单个列名作为附加参数传递:
map_dfr(.x = list(iris1, iris2),
.f = iris_median,
col = "Sepal.Length")
#> Median field
#> 1 5.0 Sepal.Length
#> 2 5.9 Sepal.Length
如果你想要同时迭代一个数据框列表和一个列名向量,你需要使用 map2_dfr
,它会迭代前两个参数。
map2_dfr(.x = list(iris1, iris2),
.y = c("Sepal.Length", "Petal.Width"),
.f = iris_median)
#> Median field
#> 1 5.0 Sepal.Length
#> 2 1.3 Petal.Width
更新
注意,map2_dfr
并行迭代 .x
和 .y
参数。如果你想要数据框和列名的所有可能组合,你需要先通过 cross
将数据框和列名传递。
df_list <- list(iris1, iris2)
col_vec <- c("Sepal.Length", "Sepal.Width")
cross(.l = list(df_list, col_vec)) %>%
map_dfr(~ iris_median(.x[[1]], .x[[2]]))
#> Median field
#> 1 5.0 Sepal.Length
#> 2 5.9 Sepal.Length
#> 3 3.4 Sepal.Width
#> 4 2.8 Sepal.Width
英文:
map_dfr
iterates through the items in the first argument (the .x
argument). It will not iterate through the additional arguments passed via ...
.
Since the map_
functions are just wrappers for a loop, your call
map_dfr("Sepal.Length", .f = iris_median, df = c(iris1, iris2))
calls the iris_median
function once (since the first argument has length 1), and passes the whole list to the df
argument at once. This is equivalent to
iris_median(df = c(iris1, iris2), col = "Sepal.Length")
And it is this call that causes your error, since you are trying to summarize
a list
.
To get map_dfr
to work on a list of data frames, you need to pass the list as the first (.x
) argument, and the single column name as an additional argument:
map_dfr(.x = list(iris1, iris2),
.f = iris_median,
col = "Sepal.Length")
#> Median field
#> 1 5.0 Sepal.Length
#> 2 5.9 Sepal.Length
If you want to iterate through both a list of data frames and a vector of column names you need map2_dfr
, which iterates through the first two arguments.
map2_dfr(.x = list(iris1, iris2),
.y = c("Sepal.Length", "Petal.Width"),
.f = iris_median)
#> Median field
#> 1 5.0 Sepal.Length
#> 2 1.3 Petal.Width
Update
Note that map2_dfr
iterates through the .x
and .y
arguments in parallel. If you want all possible combinations of the data frames and column names, you would need to pass the data frames and column names through cross
first.
df_list <- list(iris1, iris2)
col_vec <- c("Sepal.Length", "Sepal.Width")
cross(.l = list(df_list, col_vec)) %>%
map_dfr(~ iris_median(.x[[1]], .x[[2]]))
#> Median field
#> 1 5.0 Sepal.Length
#> 2 5.9 Sepal.Length
#> 3 3.4 Sepal.Width
#> 4 2.8 Sepal.Width
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论