英文:
Using map with a custom function (that returns a data frame) and multiple inputs
问题
I have a very simple function, that returns a data frame. The function takes three parameters, a dataset, and two variables that are present in the dataframe.
The function is pretty basic:
# Function takes dataset and 2 categorical variables
# Calculates number of records for each combination of values in the two variables
# Calculates % of var 1 responses for each level of var2
ugh<-function(data, var1, var2){
# add checks to make sure vars on dataset
tab_n<-data%>%
group_by_at(c(var1, var2))%>%
summarise(Numerator=n(), .groups="drop")%>%
group_by_at(c(var2))%>%
mutate(Denominator=sum(Numerator)
, Pct=Numerator/Denominator*100
# storing names of var1 and var2 for future subsetting
, Var1=var1
, Var2=var2)%>%
rename(Var1_levels=var1
, Var2_levels=var2
)
}
# Sample output
combo1<-mtcars%>ugh(var1="cyl", var2="gear")
# can also run this as:
# combo1<-ugh(data=mtcars, var1="cyl", var2="gear")
combo2<-mtcars%>ugh(var1="cyl", var2="carb")
sampleOutput<-rbind(combo1, combo2)
# Trying to use map to generate sampleOutput
var1_vector=rep("cyl", 2)
var2_vector=c("gear", "carb")
plswork<-mtcars%
map2_dfr(var1=var1_vector, var2=var2_vector, ugh)
The error message I get is:
Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
英文:
I have a very simple function, that returns a data frame. The function takes three parameters, a dataset, and two variables that are present in the dataframe.
I was hoping to use the map/pmap family of functions to feed a vector/list of inputs and produce a single (long) output dataset. I can't seem to get map/pmap tools to work for me. What can I try next?
The function is pretty basic:
library(dplyr)
# Function takes dataset and 2 categorical variables
# Calculates number of records for each combination of values in the two variables
# Calculates % of var 1 responses for each level of var2
ugh<-function(data, var1, var2){
# add checks to make sure vars on dataset
tab_n<-data%>%
group_by_at(c(var1, var2))%>%
summarise(Numerator=n(), .groups="drop")%>%
group_by_at(c(var2))%>%
mutate(Denominator=sum(Numerator)
,Pct=Numerator/Denominator*100
# storing names of var1 and var2 for future subsetting
, Var1=var1
, Var2=var2)%>%
rename(Var1_levels=var1
, Var2_levels=var2
)
}
# Sample output
combo1<-mtcars%>%ugh(var1="cyl", var2="gear")
# can also run this as:
# combo1<-ugh(data=mtcars, var1="cyl", var2="gear")
combo2<-mtcars%>%ugh(var1="cyl", var2="carb")
sampleOutput<-rbind(combo1, combo2)
# Trying to use map to generate sampleOutput
var1_vector=rep("cyl", 2)
var2_vector=c("gear", "carb")
plswork<-mtcars%>%
map2_dfr(var1=var1_vector, var2=var2_vector, ugh)
The error message I get is:
> Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
I've tried using ~
to specify the function and I've tried using map2 and binding rows separately, also tried pmap with a list of inputs... but am not having much luck.
(I am interested also in more efficient ways to summarise a subset of columns from a data frame by a different subset of columns.)
答案1
得分: 1
以下是代码的翻译部分:
library(dplyr)
library(purrr)
# 函数接受数据集和两个分类变量
# 计算两个变量每个组合的记录数
# 计算每个var1响应的var2水平的百分比
ugh <- function(data, var1, var2) {
# 添加检查以确保数据集中存在这些变量
tab_n <- data %>%
group_by(across(all_of(c(var1, var2)))) %>%
summarise(Numerator = n(), .groups = "drop") %>%
group_by(across(all_of(var2))) %>%
mutate(Denominator = sum(Numerator),
Pct = Numerator / Denominator * 100,
# 存储var1和var2的名称以供将来子集
Var1 = .env$var1,
Var2 = .env$var2) %>%
rename(Var1_levels = all_of(var1),
Var2_levels = all_of(var2)
)
}
# 示例输出
combo1 <- mtcars %>% ugh(var1 = "cyl", var2 = "gear")
# 也可以这样运行:
# combo1 <- ugh(data = mtcars, var1 = "cyl", var2 = "gear")
combo2 <- mtcars %>% ugh(var1 = "cyl", var2 = "carb")
sampleOutput <- rbind(combo1, combo2)
sampleOutput
如果您需要关于代码中其他部分的翻译,请随时提出。
英文:
There are a few solutions to this problem. I recommend the first because I think its the most clear. I also tweaked your function to fix any use of deprecated/superseded functions/behaviour.
library(dplyr)
library(purrr)
# Function takes dataset and 2 categorical variables
# Calculates number of records for each combination of values in the two variables
# Calculates % of var 1 responses for each level of var2
ugh<-function(data, var1, var2){
# add checks to make sure vars on dataset
tab_n<-data%>%
group_by(across(all_of(c(var1, var2))))%>%
summarise(Numerator=n(), .groups="drop")%>%
group_by(across(all_of(var2)))%>%
mutate(Denominator=sum(Numerator)
,Pct=Numerator/Denominator*100
# storing names of var1 and var2 for future subsetting
, Var1= .env$var1
, Var2= .env$var2)%>%
rename(Var1_levels= all_of(var1)
, Var2_levels= all_of(var2)
)
}
# Sample output
combo1<-mtcars%>%ugh(var1="cyl", var2="gear")
# can also run this as:
# combo1<-ugh(data=mtcars, var1="cyl", var2="gear")
combo2<-mtcars%>%ugh(var1="cyl", var2="carb")
sampleOutput<-rbind(combo1, combo2)
sampleOutput
#> # A tibble: 17 × 7
#> # Groups: Var2_levels [7]
#> Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
#> <dbl> <dbl> <int> <int> <dbl> <chr> <chr>
#> 1 4 3 1 15 6.67 cyl gear
#> 2 4 4 8 12 66.7 cyl gear
#> 3 4 5 2 5 40 cyl gear
#> 4 6 3 2 15 13.3 cyl gear
#> 5 6 4 4 12 33.3 cyl gear
#> 6 6 5 1 5 20 cyl gear
#> 7 8 3 12 15 80 cyl gear
#> 8 8 5 2 5 40 cyl gear
#> 9 4 1 5 7 71.4 cyl carb
#> 10 4 2 6 10 60 cyl carb
#> 11 6 1 2 7 28.6 cyl carb
#> 12 6 4 4 10 40 cyl carb
#> 13 6 6 1 1 100 cyl carb
#> 14 8 2 4 10 40 cyl carb
#> 15 8 3 3 3 100 cyl carb
#> 16 8 4 6 10 60 cyl carb
#> 17 8 8 1 1 100 cyl carb
# Trying to use map to generate sampleOutput
var1_vector=rep("cyl", 2)
var2_vector=c("gear", "carb")
# Method 1 (recommended): use of anonymous functions
map2(var1_vector, var2_vector, \(var1, var2) ugh(mtcars, var1, var2)) %>%
list_rbind()
#> # A tibble: 17 × 7
#> # Groups: Var2_levels [7]
#> Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
#> <dbl> <dbl> <int> <int> <dbl> <chr> <chr>
#> 1 4 3 1 15 6.67 cyl gear
#> 2 4 4 8 12 66.7 cyl gear
#> 3 4 5 2 5 40 cyl gear
#> 4 6 3 2 15 13.3 cyl gear
#> 5 6 4 4 12 33.3 cyl gear
#> 6 6 5 1 5 20 cyl gear
#> 7 8 3 12 15 80 cyl gear
#> 8 8 5 2 5 40 cyl gear
#> 9 4 1 5 7 71.4 cyl carb
#> 10 4 2 6 10 60 cyl carb
#> 11 6 1 2 7 28.6 cyl carb
#> 12 6 4 4 10 40 cyl carb
#> 13 6 6 1 1 100 cyl carb
#> 14 8 2 4 10 40 cyl carb
#> 15 8 3 3 3 100 cyl carb
#> 16 8 4 6 10 60 cyl carb
#> 17 8 8 1 1 100 cyl carb
# If you aren't using a version of R with anonymous functions:
map2(var1_vector, var2_vector, ~ ugh(mtcars, .x, .y)) %>%
list_rbind()
#> # A tibble: 17 × 7
#> # Groups: Var2_levels [7]
#> Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
#> <dbl> <dbl> <int> <int> <dbl> <chr> <chr>
#> 1 4 3 1 15 6.67 cyl gear
#> 2 4 4 8 12 66.7 cyl gear
#> 3 4 5 2 5 40 cyl gear
#> 4 6 3 2 15 13.3 cyl gear
#> 5 6 4 4 12 33.3 cyl gear
#> 6 6 5 1 5 20 cyl gear
#> 7 8 3 12 15 80 cyl gear
#> 8 8 5 2 5 40 cyl gear
#> 9 4 1 5 7 71.4 cyl carb
#> 10 4 2 6 10 60 cyl carb
#> 11 6 1 2 7 28.6 cyl carb
#> 12 6 4 4 10 40 cyl carb
#> 13 6 6 1 1 100 cyl carb
#> 14 8 2 4 10 40 cyl carb
#> 15 8 3 3 3 100 cyl carb
#> 16 8 4 6 10 60 cyl carb
#> 17 8 8 1 1 100 cyl carb
# Alternatively, using pmap():
args <- list(
var1 = var1_vector,
var2 = var2_vector
)
pmap(args, ugh, mtcars) %>%
list_rbind()
#> # A tibble: 17 × 7
#> # Groups: Var2_levels [7]
#> Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
#> <dbl> <dbl> <int> <int> <dbl> <chr> <chr>
#> 1 4 3 1 15 6.67 cyl gear
#> 2 4 4 8 12 66.7 cyl gear
#> 3 4 5 2 5 40 cyl gear
#> 4 6 3 2 15 13.3 cyl gear
#> 5 6 4 4 12 33.3 cyl gear
#> 6 6 5 1 5 20 cyl gear
#> 7 8 3 12 15 80 cyl gear
#> 8 8 5 2 5 40 cyl gear
#> 9 4 1 5 7 71.4 cyl carb
#> 10 4 2 6 10 60 cyl carb
#> 11 6 1 2 7 28.6 cyl carb
#> 12 6 4 4 10 40 cyl carb
#> 13 6 6 1 1 100 cyl carb
#> 14 8 2 4 10 40 cyl carb
#> 15 8 3 3 3 100 cyl carb
#> 16 8 4 6 10 60 cyl carb
#> 17 8 8 1 1 100 cyl carb
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论