使用map与自定义函数(返回数据框的函数)和多个输入。

huangapple go评论49阅读模式
英文:

Using map with a custom function (that returns a data frame) and multiple inputs

问题

I have a very simple function, that returns a data frame. The function takes three parameters, a dataset, and two variables that are present in the dataframe.

The function is pretty basic:


# Function takes dataset and 2 categorical variables
# Calculates number of records for each combination of values in the two variables
# Calculates % of var 1 responses for each level of var2
ugh<-function(data, var1, var2){
# add checks to make sure vars on dataset

tab_n<-data%>%
group_by_at(c(var1, var2))%>%
summarise(Numerator=n(), .groups="drop")%>%
group_by_at(c(var2))%>%
mutate(Denominator=sum(Numerator)
, Pct=Numerator/Denominator*100
# storing names of var1 and var2 for future subsetting
, Var1=var1 
, Var2=var2)%>% 
rename(Var1_levels=var1
, Var2_levels=var2
)
}

# Sample output
combo1<-mtcars%>ugh(var1="cyl", var2="gear")
# can also run this as: 
# combo1<-ugh(data=mtcars, var1="cyl", var2="gear")
combo2<-mtcars%>ugh(var1="cyl", var2="carb")
sampleOutput<-rbind(combo1, combo2)

# Trying to use map to generate sampleOutput
var1_vector=rep("cyl", 2) 
var2_vector=c("gear", "carb")

plswork<-mtcars%
map2_dfr(var1=var1_vector, var2=var2_vector, ugh)

The error message I get is:

Error in as_mapper(.f, ...) : argument ".f" is missing, with no default

英文:

I have a very simple function, that returns a data frame. The function takes three parameters, a dataset, and two variables that are present in the dataframe.

I was hoping to use the map/pmap family of functions to feed a vector/list of inputs and produce a single (long) output dataset. I can't seem to get map/pmap tools to work for me. What can I try next?

The function is pretty basic:

library(dplyr)

# Function takes dataset and 2 categorical variables 
# Calculates number of records for each combination of values in the two variables
# Calculates % of var 1 responses for each level of var2
ugh&lt;-function(data, var1, var2){
# add checks to make sure vars on dataset

  tab_n&lt;-data%&gt;%
    group_by_at(c(var1, var2))%&gt;%
    summarise(Numerator=n(), .groups=&quot;drop&quot;)%&gt;%
    group_by_at(c(var2))%&gt;%
    mutate(Denominator=sum(Numerator)
           ,Pct=Numerator/Denominator*100
# storing names of var1 and var2 for future subsetting
           , Var1=var1 
           , Var2=var2)%&gt;% 
    rename(Var1_levels=var1
           , Var2_levels=var2
           )
}


# Sample output 
combo1&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
# can also run this as: 
# combo1&lt;-ugh(data=mtcars, var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
combo2&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;carb&quot;)
sampleOutput&lt;-rbind(combo1, combo2)

# Trying to use map to generate sampleOutput
var1_vector=rep(&quot;cyl&quot;, 2) 
var2_vector=c(&quot;gear&quot;, &quot;carb&quot;)

plswork&lt;-mtcars%&gt;%
  map2_dfr(var1=var1_vector, var2=var2_vector, ugh)

The error message I get is:

> Error in as_mapper(.f, ...) : argument ".f" is missing, with no default

I've tried using ~ to specify the function and I've tried using map2 and binding rows separately, also tried pmap with a list of inputs... but am not having much luck.

(I am interested also in more efficient ways to summarise a subset of columns from a data frame by a different subset of columns.)

答案1

得分: 1

以下是代码的翻译部分:

library(dplyr)
library(purrr)

# 函数接受数据集和两个分类变量
# 计算两个变量每个组合的记录数
# 计算每个var1响应的var2水平的百分比
ugh <- function(data, var1, var2) {
  # 添加检查以确保数据集中存在这些变量
  
  tab_n <- data %>%
    group_by(across(all_of(c(var1, var2)))) %>%
    summarise(Numerator = n(), .groups = "drop") %>%
    group_by(across(all_of(var2))) %>%
    mutate(Denominator = sum(Numerator),
           Pct = Numerator / Denominator * 100,
           # 存储var1和var2的名称以供将来子集
           Var1 = .env$var1,
           Var2 = .env$var2) %>%
    rename(Var1_levels = all_of(var1),
           Var2_levels = all_of(var2)
    )
}

# 示例输出
combo1 <- mtcars %>% ugh(var1 = "cyl", var2 = "gear")
# 也可以这样运行:
# combo1 <- ugh(data = mtcars, var1 = "cyl", var2 = "gear")
combo2 <- mtcars %>% ugh(var1 = "cyl", var2 = "carb")

sampleOutput <- rbind(combo1, combo2)

sampleOutput

如果您需要关于代码中其他部分的翻译,请随时提出。

英文:

There are a few solutions to this problem. I recommend the first because I think its the most clear. I also tweaked your function to fix any use of deprecated/superseded functions/behaviour.

library(dplyr)
library(purrr)

# Function takes dataset and 2 categorical variables 
# Calculates number of records for each combination of values in the two variables
# Calculates % of var 1 responses for each level of var2
ugh&lt;-function(data, var1, var2){
  # add checks to make sure vars on dataset
  
  tab_n&lt;-data%&gt;%
    group_by(across(all_of(c(var1, var2))))%&gt;%
    summarise(Numerator=n(), .groups=&quot;drop&quot;)%&gt;%
    group_by(across(all_of(var2)))%&gt;%
    mutate(Denominator=sum(Numerator)
           ,Pct=Numerator/Denominator*100
           # storing names of var1 and var2 for future subsetting
           , Var1= .env$var1 
           , Var2= .env$var2)%&gt;% 
    rename(Var1_levels= all_of(var1)
           , Var2_levels= all_of(var2)
    )
}

# Sample output 
combo1&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
# can also run this as: 
# combo1&lt;-ugh(data=mtcars, var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
combo2&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;carb&quot;)

sampleOutput&lt;-rbind(combo1, combo2)

sampleOutput
#&gt; # A tibble: 17 &#215; 7
#&gt; # Groups:   Var2_levels [7]
#&gt;    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
#&gt;          &lt;dbl&gt;       &lt;dbl&gt;     &lt;int&gt;       &lt;int&gt;  &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
#&gt;  1           4           3         1          15   6.67 cyl   gear 
#&gt;  2           4           4         8          12  66.7  cyl   gear 
#&gt;  3           4           5         2           5  40    cyl   gear 
#&gt;  4           6           3         2          15  13.3  cyl   gear 
#&gt;  5           6           4         4          12  33.3  cyl   gear 
#&gt;  6           6           5         1           5  20    cyl   gear 
#&gt;  7           8           3        12          15  80    cyl   gear 
#&gt;  8           8           5         2           5  40    cyl   gear 
#&gt;  9           4           1         5           7  71.4  cyl   carb 
#&gt; 10           4           2         6          10  60    cyl   carb 
#&gt; 11           6           1         2           7  28.6  cyl   carb 
#&gt; 12           6           4         4          10  40    cyl   carb 
#&gt; 13           6           6         1           1 100    cyl   carb 
#&gt; 14           8           2         4          10  40    cyl   carb 
#&gt; 15           8           3         3           3 100    cyl   carb 
#&gt; 16           8           4         6          10  60    cyl   carb 
#&gt; 17           8           8         1           1 100    cyl   carb

# Trying to use map to generate sampleOutput
var1_vector=rep(&quot;cyl&quot;, 2) 
var2_vector=c(&quot;gear&quot;, &quot;carb&quot;)

# Method 1 (recommended): use of anonymous functions
map2(var1_vector, var2_vector, \(var1, var2) ugh(mtcars, var1, var2)) %&gt;%
  list_rbind()
#&gt; # A tibble: 17 &#215; 7
#&gt; # Groups:   Var2_levels [7]
#&gt;    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
#&gt;          &lt;dbl&gt;       &lt;dbl&gt;     &lt;int&gt;       &lt;int&gt;  &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
#&gt;  1           4           3         1          15   6.67 cyl   gear 
#&gt;  2           4           4         8          12  66.7  cyl   gear 
#&gt;  3           4           5         2           5  40    cyl   gear 
#&gt;  4           6           3         2          15  13.3  cyl   gear 
#&gt;  5           6           4         4          12  33.3  cyl   gear 
#&gt;  6           6           5         1           5  20    cyl   gear 
#&gt;  7           8           3        12          15  80    cyl   gear 
#&gt;  8           8           5         2           5  40    cyl   gear 
#&gt;  9           4           1         5           7  71.4  cyl   carb 
#&gt; 10           4           2         6          10  60    cyl   carb 
#&gt; 11           6           1         2           7  28.6  cyl   carb 
#&gt; 12           6           4         4          10  40    cyl   carb 
#&gt; 13           6           6         1           1 100    cyl   carb 
#&gt; 14           8           2         4          10  40    cyl   carb 
#&gt; 15           8           3         3           3 100    cyl   carb 
#&gt; 16           8           4         6          10  60    cyl   carb 
#&gt; 17           8           8         1           1 100    cyl   carb

# If you aren&#39;t using a version of R with anonymous functions:
map2(var1_vector, var2_vector, ~ ugh(mtcars, .x, .y)) %&gt;%
  list_rbind()
#&gt; # A tibble: 17 &#215; 7
#&gt; # Groups:   Var2_levels [7]
#&gt;    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
#&gt;          &lt;dbl&gt;       &lt;dbl&gt;     &lt;int&gt;       &lt;int&gt;  &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
#&gt;  1           4           3         1          15   6.67 cyl   gear 
#&gt;  2           4           4         8          12  66.7  cyl   gear 
#&gt;  3           4           5         2           5  40    cyl   gear 
#&gt;  4           6           3         2          15  13.3  cyl   gear 
#&gt;  5           6           4         4          12  33.3  cyl   gear 
#&gt;  6           6           5         1           5  20    cyl   gear 
#&gt;  7           8           3        12          15  80    cyl   gear 
#&gt;  8           8           5         2           5  40    cyl   gear 
#&gt;  9           4           1         5           7  71.4  cyl   carb 
#&gt; 10           4           2         6          10  60    cyl   carb 
#&gt; 11           6           1         2           7  28.6  cyl   carb 
#&gt; 12           6           4         4          10  40    cyl   carb 
#&gt; 13           6           6         1           1 100    cyl   carb 
#&gt; 14           8           2         4          10  40    cyl   carb 
#&gt; 15           8           3         3           3 100    cyl   carb 
#&gt; 16           8           4         6          10  60    cyl   carb 
#&gt; 17           8           8         1           1 100    cyl   carb

# Alternatively, using pmap():
args &lt;- list(
  var1 = var1_vector,
  var2 = var2_vector
)

pmap(args, ugh, mtcars) %&gt;%
  list_rbind()
#&gt; # A tibble: 17 &#215; 7
#&gt; # Groups:   Var2_levels [7]
#&gt;    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
#&gt;          &lt;dbl&gt;       &lt;dbl&gt;     &lt;int&gt;       &lt;int&gt;  &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
#&gt;  1           4           3         1          15   6.67 cyl   gear 
#&gt;  2           4           4         8          12  66.7  cyl   gear 
#&gt;  3           4           5         2           5  40    cyl   gear 
#&gt;  4           6           3         2          15  13.3  cyl   gear 
#&gt;  5           6           4         4          12  33.3  cyl   gear 
#&gt;  6           6           5         1           5  20    cyl   gear 
#&gt;  7           8           3        12          15  80    cyl   gear 
#&gt;  8           8           5         2           5  40    cyl   gear 
#&gt;  9           4           1         5           7  71.4  cyl   carb 
#&gt; 10           4           2         6          10  60    cyl   carb 
#&gt; 11           6           1         2           7  28.6  cyl   carb 
#&gt; 12           6           4         4          10  40    cyl   carb 
#&gt; 13           6           6         1           1 100    cyl   carb 
#&gt; 14           8           2         4          10  40    cyl   carb 
#&gt; 15           8           3         3           3 100    cyl   carb 
#&gt; 16           8           4         6          10  60    cyl   carb 
#&gt; 17           8           8         1           1 100    cyl   carb

huangapple
  • 本文由 发表于 2023年4月11日 11:47:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75982251.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定