使用map与自定义函数(返回数据框的函数)和多个输入。

huangapple go评论96阅读模式
英文:

Using map with a custom function (that returns a data frame) and multiple inputs

问题

I have a very simple function, that returns a data frame. The function takes three parameters, a dataset, and two variables that are present in the dataframe.

The function is pretty basic:

  1. # Function takes dataset and 2 categorical variables
  2. # Calculates number of records for each combination of values in the two variables
  3. # Calculates % of var 1 responses for each level of var2
  4. ugh<-function(data, var1, var2){
  5. # add checks to make sure vars on dataset
  6. tab_n<-data%>%
  7. group_by_at(c(var1, var2))%>%
  8. summarise(Numerator=n(), .groups="drop")%>%
  9. group_by_at(c(var2))%>%
  10. mutate(Denominator=sum(Numerator)
  11. , Pct=Numerator/Denominator*100
  12. # storing names of var1 and var2 for future subsetting
  13. , Var1=var1
  14. , Var2=var2)%>%
  15. rename(Var1_levels=var1
  16. , Var2_levels=var2
  17. )
  18. }
  19. # Sample output
  20. combo1<-mtcars%>ugh(var1="cyl", var2="gear")
  21. # can also run this as:
  22. # combo1<-ugh(data=mtcars, var1="cyl", var2="gear")
  23. combo2<-mtcars%>ugh(var1="cyl", var2="carb")
  24. sampleOutput<-rbind(combo1, combo2)
  25. # Trying to use map to generate sampleOutput
  26. var1_vector=rep("cyl", 2)
  27. var2_vector=c("gear", "carb")
  28. plswork<-mtcars%
  29. map2_dfr(var1=var1_vector, var2=var2_vector, ugh)

The error message I get is:

Error in as_mapper(.f, ...) : argument ".f" is missing, with no default

英文:

I have a very simple function, that returns a data frame. The function takes three parameters, a dataset, and two variables that are present in the dataframe.

I was hoping to use the map/pmap family of functions to feed a vector/list of inputs and produce a single (long) output dataset. I can't seem to get map/pmap tools to work for me. What can I try next?

The function is pretty basic:

  1. library(dplyr)
  2. # Function takes dataset and 2 categorical variables
  3. # Calculates number of records for each combination of values in the two variables
  4. # Calculates % of var 1 responses for each level of var2
  5. ugh&lt;-function(data, var1, var2){
  6. # add checks to make sure vars on dataset
  7. tab_n&lt;-data%&gt;%
  8. group_by_at(c(var1, var2))%&gt;%
  9. summarise(Numerator=n(), .groups=&quot;drop&quot;)%&gt;%
  10. group_by_at(c(var2))%&gt;%
  11. mutate(Denominator=sum(Numerator)
  12. ,Pct=Numerator/Denominator*100
  13. # storing names of var1 and var2 for future subsetting
  14. , Var1=var1
  15. , Var2=var2)%&gt;%
  16. rename(Var1_levels=var1
  17. , Var2_levels=var2
  18. )
  19. }
  20. # Sample output
  21. combo1&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
  22. # can also run this as:
  23. # combo1&lt;-ugh(data=mtcars, var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
  24. combo2&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;carb&quot;)
  25. sampleOutput&lt;-rbind(combo1, combo2)
  26. # Trying to use map to generate sampleOutput
  27. var1_vector=rep(&quot;cyl&quot;, 2)
  28. var2_vector=c(&quot;gear&quot;, &quot;carb&quot;)
  29. plswork&lt;-mtcars%&gt;%
  30. map2_dfr(var1=var1_vector, var2=var2_vector, ugh)

The error message I get is:

> Error in as_mapper(.f, ...) : argument ".f" is missing, with no default

I've tried using ~ to specify the function and I've tried using map2 and binding rows separately, also tried pmap with a list of inputs... but am not having much luck.

(I am interested also in more efficient ways to summarise a subset of columns from a data frame by a different subset of columns.)

答案1

得分: 1

以下是代码的翻译部分:

  1. library(dplyr)
  2. library(purrr)
  3. # 函数接受数据集和两个分类变量
  4. # 计算两个变量每个组合的记录数
  5. # 计算每个var1响应的var2水平的百分比
  6. ugh <- function(data, var1, var2) {
  7. # 添加检查以确保数据集中存在这些变量
  8. tab_n <- data %>%
  9. group_by(across(all_of(c(var1, var2)))) %>%
  10. summarise(Numerator = n(), .groups = "drop") %>%
  11. group_by(across(all_of(var2))) %>%
  12. mutate(Denominator = sum(Numerator),
  13. Pct = Numerator / Denominator * 100,
  14. # 存储var1和var2的名称以供将来子集
  15. Var1 = .env$var1,
  16. Var2 = .env$var2) %>%
  17. rename(Var1_levels = all_of(var1),
  18. Var2_levels = all_of(var2)
  19. )
  20. }
  21. # 示例输出
  22. combo1 <- mtcars %>% ugh(var1 = "cyl", var2 = "gear")
  23. # 也可以这样运行:
  24. # combo1 <- ugh(data = mtcars, var1 = "cyl", var2 = "gear")
  25. combo2 <- mtcars %>% ugh(var1 = "cyl", var2 = "carb")
  26. sampleOutput <- rbind(combo1, combo2)
  27. sampleOutput

如果您需要关于代码中其他部分的翻译,请随时提出。

英文:

There are a few solutions to this problem. I recommend the first because I think its the most clear. I also tweaked your function to fix any use of deprecated/superseded functions/behaviour.

  1. library(dplyr)
  2. library(purrr)
  3. # Function takes dataset and 2 categorical variables
  4. # Calculates number of records for each combination of values in the two variables
  5. # Calculates % of var 1 responses for each level of var2
  6. ugh&lt;-function(data, var1, var2){
  7. # add checks to make sure vars on dataset
  8. tab_n&lt;-data%&gt;%
  9. group_by(across(all_of(c(var1, var2))))%&gt;%
  10. summarise(Numerator=n(), .groups=&quot;drop&quot;)%&gt;%
  11. group_by(across(all_of(var2)))%&gt;%
  12. mutate(Denominator=sum(Numerator)
  13. ,Pct=Numerator/Denominator*100
  14. # storing names of var1 and var2 for future subsetting
  15. , Var1= .env$var1
  16. , Var2= .env$var2)%&gt;%
  17. rename(Var1_levels= all_of(var1)
  18. , Var2_levels= all_of(var2)
  19. )
  20. }
  21. # Sample output
  22. combo1&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
  23. # can also run this as:
  24. # combo1&lt;-ugh(data=mtcars, var1=&quot;cyl&quot;, var2=&quot;gear&quot;)
  25. combo2&lt;-mtcars%&gt;%ugh(var1=&quot;cyl&quot;, var2=&quot;carb&quot;)
  26. sampleOutput&lt;-rbind(combo1, combo2)
  27. sampleOutput
  28. #&gt; # A tibble: 17 &#215; 7
  29. #&gt; # Groups: Var2_levels [7]
  30. #&gt; Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
  31. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
  32. #&gt; 1 4 3 1 15 6.67 cyl gear
  33. #&gt; 2 4 4 8 12 66.7 cyl gear
  34. #&gt; 3 4 5 2 5 40 cyl gear
  35. #&gt; 4 6 3 2 15 13.3 cyl gear
  36. #&gt; 5 6 4 4 12 33.3 cyl gear
  37. #&gt; 6 6 5 1 5 20 cyl gear
  38. #&gt; 7 8 3 12 15 80 cyl gear
  39. #&gt; 8 8 5 2 5 40 cyl gear
  40. #&gt; 9 4 1 5 7 71.4 cyl carb
  41. #&gt; 10 4 2 6 10 60 cyl carb
  42. #&gt; 11 6 1 2 7 28.6 cyl carb
  43. #&gt; 12 6 4 4 10 40 cyl carb
  44. #&gt; 13 6 6 1 1 100 cyl carb
  45. #&gt; 14 8 2 4 10 40 cyl carb
  46. #&gt; 15 8 3 3 3 100 cyl carb
  47. #&gt; 16 8 4 6 10 60 cyl carb
  48. #&gt; 17 8 8 1 1 100 cyl carb
  49. # Trying to use map to generate sampleOutput
  50. var1_vector=rep(&quot;cyl&quot;, 2)
  51. var2_vector=c(&quot;gear&quot;, &quot;carb&quot;)
  52. # Method 1 (recommended): use of anonymous functions
  53. map2(var1_vector, var2_vector, \(var1, var2) ugh(mtcars, var1, var2)) %&gt;%
  54. list_rbind()
  55. #&gt; # A tibble: 17 &#215; 7
  56. #&gt; # Groups: Var2_levels [7]
  57. #&gt; Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
  58. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
  59. #&gt; 1 4 3 1 15 6.67 cyl gear
  60. #&gt; 2 4 4 8 12 66.7 cyl gear
  61. #&gt; 3 4 5 2 5 40 cyl gear
  62. #&gt; 4 6 3 2 15 13.3 cyl gear
  63. #&gt; 5 6 4 4 12 33.3 cyl gear
  64. #&gt; 6 6 5 1 5 20 cyl gear
  65. #&gt; 7 8 3 12 15 80 cyl gear
  66. #&gt; 8 8 5 2 5 40 cyl gear
  67. #&gt; 9 4 1 5 7 71.4 cyl carb
  68. #&gt; 10 4 2 6 10 60 cyl carb
  69. #&gt; 11 6 1 2 7 28.6 cyl carb
  70. #&gt; 12 6 4 4 10 40 cyl carb
  71. #&gt; 13 6 6 1 1 100 cyl carb
  72. #&gt; 14 8 2 4 10 40 cyl carb
  73. #&gt; 15 8 3 3 3 100 cyl carb
  74. #&gt; 16 8 4 6 10 60 cyl carb
  75. #&gt; 17 8 8 1 1 100 cyl carb
  76. # If you aren&#39;t using a version of R with anonymous functions:
  77. map2(var1_vector, var2_vector, ~ ugh(mtcars, .x, .y)) %&gt;%
  78. list_rbind()
  79. #&gt; # A tibble: 17 &#215; 7
  80. #&gt; # Groups: Var2_levels [7]
  81. #&gt; Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
  82. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
  83. #&gt; 1 4 3 1 15 6.67 cyl gear
  84. #&gt; 2 4 4 8 12 66.7 cyl gear
  85. #&gt; 3 4 5 2 5 40 cyl gear
  86. #&gt; 4 6 3 2 15 13.3 cyl gear
  87. #&gt; 5 6 4 4 12 33.3 cyl gear
  88. #&gt; 6 6 5 1 5 20 cyl gear
  89. #&gt; 7 8 3 12 15 80 cyl gear
  90. #&gt; 8 8 5 2 5 40 cyl gear
  91. #&gt; 9 4 1 5 7 71.4 cyl carb
  92. #&gt; 10 4 2 6 10 60 cyl carb
  93. #&gt; 11 6 1 2 7 28.6 cyl carb
  94. #&gt; 12 6 4 4 10 40 cyl carb
  95. #&gt; 13 6 6 1 1 100 cyl carb
  96. #&gt; 14 8 2 4 10 40 cyl carb
  97. #&gt; 15 8 3 3 3 100 cyl carb
  98. #&gt; 16 8 4 6 10 60 cyl carb
  99. #&gt; 17 8 8 1 1 100 cyl carb
  100. # Alternatively, using pmap():
  101. args &lt;- list(
  102. var1 = var1_vector,
  103. var2 = var2_vector
  104. )
  105. pmap(args, ugh, mtcars) %&gt;%
  106. list_rbind()
  107. #&gt; # A tibble: 17 &#215; 7
  108. #&gt; # Groups: Var2_levels [7]
  109. #&gt; Var1_levels Var2_levels Numerator Denominator Pct Var1 Var2
  110. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
  111. #&gt; 1 4 3 1 15 6.67 cyl gear
  112. #&gt; 2 4 4 8 12 66.7 cyl gear
  113. #&gt; 3 4 5 2 5 40 cyl gear
  114. #&gt; 4 6 3 2 15 13.3 cyl gear
  115. #&gt; 5 6 4 4 12 33.3 cyl gear
  116. #&gt; 6 6 5 1 5 20 cyl gear
  117. #&gt; 7 8 3 12 15 80 cyl gear
  118. #&gt; 8 8 5 2 5 40 cyl gear
  119. #&gt; 9 4 1 5 7 71.4 cyl carb
  120. #&gt; 10 4 2 6 10 60 cyl carb
  121. #&gt; 11 6 1 2 7 28.6 cyl carb
  122. #&gt; 12 6 4 4 10 40 cyl carb
  123. #&gt; 13 6 6 1 1 100 cyl carb
  124. #&gt; 14 8 2 4 10 40 cyl carb
  125. #&gt; 15 8 3 3 3 100 cyl carb
  126. #&gt; 16 8 4 6 10 60 cyl carb
  127. #&gt; 17 8 8 1 1 100 cyl carb

huangapple
  • 本文由 发表于 2023年4月11日 11:47:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75982251.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定