将多个数据框传递到函数中,使用`map_dfr`生成数据框。

huangapple go评论66阅读模式
英文:

Passing multiple dataframes into function to produce dataframe using map_dfr

问题

我有一个函数,我正在传递三个变量(列名)和一个数据框,并将结果放入一个数据框中,使用map_dfr工作得很好。

library(tidyverse)

iris_median <- function(df, col){
    df %>%
        summarise(Median = median(.data[[col]], na.rm = TRUE)) %>%
        mutate(field = as.character(!!col))
}

median_df <- map_dfr(c("Sepal.Length", "Sepal.Width", "Petal.Length"), .f = iris_median, df = iris)

当我尝试像下面这样将多个数据框传递到这个函数中时:

iris1 <- iris[1:50,]
iris2 <- iris[51:100,]

median_df <- map_dfr("Sepal.Length", .f = iris_median, df = c(iris1, iris2))

我收到错误消息:no applicable method for 'summarise' applied to an object of class "list"

我不明白为什么它不只是迭代两个不同的数据框并对每个数据框执行计算。

有人能修改一下代码吗?

英文:

I have a function which I am passing three variables (column names) and a dataframe and putting the results into a dataframe with map_dfr which works fine.

library(tidyverse)

iris_median&lt;-function(df, col){
    df%&gt;%
        summarise(Median=median(.data[[col]], na.rm=TRUE))%&gt;%
        mutate(field=as.character(!!col))
}

median_df &lt;- map_dfr(c(&quot;Sepal.Length&quot;,&quot;Sepal.Width&quot;,
                       &quot;Petal.Length&quot;), .f=iris_median,
                     df= iris)

When I try to pass multiple dataframes into this function as below:

iris1 &lt;- iris [1:50,]
iris2 &lt;- iris [51:100,]

median_df &lt;- map_dfr(&quot;Sepal.Length&quot;, .f=iris_median,
                     df= c(iris1, iris2))

I get the error message: no applicable method for &#39;summarise&#39; applied to an object of class &quot;list&quot;

I don't understand why it isn't just iterating through the two different dataframes and doing the calculation for each dataframe.

Can anyone ammend the code?

答案1

得分: 4

map_dfr 通过第一个参数(.x 参数)迭代其中的项目。它不会迭代通过 ... 传递的额外参数。

由于 map_ 函数只是循环的包装器,你的调用

map_dfr("Sepal.Length", .f = iris_median, df = c(iris1, iris2))

调用 iris_median 函数一次(因为第一个参数的长度为1),并立即将整个列表传递给 df 参数。这等效于

iris_median(df = c(iris1, iris2), col = "Sepal.Length")

而正是这个调用导致了你的错误,因为你试图对一个 list 进行 summarize

要让 map_dfr 在数据框列表上工作,你需要将列表作为第一个(.x)参数传递,并将单个列名作为附加参数传递:

map_dfr(.x = list(iris1, iris2),
        .f = iris_median, 
        col = "Sepal.Length")
#>   Median        field
#> 1    5.0 Sepal.Length
#> 2    5.9 Sepal.Length

如果你想要同时迭代一个数据框列表和一个列名向量,你需要使用 map2_dfr,它会迭代前两个参数。

map2_dfr(.x = list(iris1, iris2), 
         .y = c("Sepal.Length", "Petal.Width"), 
         .f = iris_median)
#>   Median        field
#> 1    5.0 Sepal.Length
#> 2    1.3  Petal.Width

更新

注意,map2_dfr 并行迭代 .x.y 参数。如果你想要数据框和列名的所有可能组合,你需要先通过 cross 将数据框和列名传递。


df_list <- list(iris1, iris2)
col_vec <- c("Sepal.Length", "Sepal.Width")

cross(.l = list(df_list, col_vec)) %>%
  map_dfr(~ iris_median(.x[[1]], .x[[2]]))
#>   Median        field
#> 1    5.0 Sepal.Length
#> 2    5.9 Sepal.Length
#> 3    3.4  Sepal.Width
#> 4    2.8  Sepal.Width
英文:

map_dfr iterates through the items in the first argument (the .x argument). It will not iterate through the additional arguments passed via ....

Since the map_ functions are just wrappers for a loop, your call

map_dfr(&quot;Sepal.Length&quot;, .f = iris_median, df = c(iris1, iris2))

calls the iris_median function once (since the first argument has length 1), and passes the whole list to the df argument at once. This is equivalent to

iris_median(df = c(iris1, iris2), col = &quot;Sepal.Length&quot;)

And it is this call that causes your error, since you are trying to summarize a list.

To get map_dfr to work on a list of data frames, you need to pass the list as the first (.x) argument, and the single column name as an additional argument:

map_dfr(.x = list(iris1, iris2),
        .f = iris_median, 
        col = &quot;Sepal.Length&quot;)
#&gt;   Median        field
#&gt; 1    5.0 Sepal.Length
#&gt; 2    5.9 Sepal.Length

If you want to iterate through both a list of data frames and a vector of column names you need map2_dfr, which iterates through the first two arguments.

map2_dfr(.x = list(iris1, iris2), 
         .y = c(&quot;Sepal.Length&quot;, &quot;Petal.Width&quot;), 
         .f = iris_median)
#&gt;   Median        field
#&gt; 1    5.0 Sepal.Length
#&gt; 2    1.3  Petal.Width

Update

Note that map2_dfr iterates through the .x and .y arguments in parallel. If you want all possible combinations of the data frames and column names, you would need to pass the data frames and column names through cross first.


df_list &lt;- list(iris1, iris2)
col_vec &lt;- c(&quot;Sepal.Length&quot;, &quot;Sepal.Width&quot;)

cross(.l = list(df_list, col_vec)) %&gt;%
  map_dfr(~ iris_median(.x[[1]], .x[[2]]))
#&gt;   Median        field
#&gt; 1    5.0 Sepal.Length
#&gt; 2    5.9 Sepal.Length
#&gt; 3    3.4  Sepal.Width
#&gt; 4    2.8  Sepal.Width

huangapple
  • 本文由 发表于 2023年7月18日 00:05:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706277.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定