在R中使用用户定义的函数执行多个变量操作时遇到的困难。

huangapple go评论75阅读模式
英文:

Difficulty with user-defined function to perform operation on multiple variables in R

问题

我正在对数据框中的变量进行 t 检验:

```R
library(rstatix)

df <- data.frame(grouping = c(rep("left", 50), rep("right", 50)), 
                 var1 = (rnorm(100, mean=21, sd=3)))

var1_result <- df %>% 
  t_test(var1 ~ grouping, paired = TRUE, detailed = TRUE) %>%
  rstatix::add_significance()

var1_result 

我已经使其工作,但是想通过调用用户定义的函数来改进,而不是重复编写代码行。我尝试了以下方式:

my_t_test <- function(dataset, parameter, grouping_variable) {
  parameter <- dataset %>% t_test({{parameter}} ~ {{grouping_variable}}, paired = TRUE, detailed = TRUE) %>%
    add_significance()
  return(parameter)
}
my_t_test(df, var1, grouping)

但是,我遇到了错误:“pull() 中的错误:无法提取不存在的列。✖ 列 ... 不存在。”

我找到了一些帖子,讨论了如何在使用 dplyr 风格编写的函数中调用数据框变量(例如,https://stackoverflow.com/questions/73156960/how-can-i-write-a-function-in-r-which-accepts-column-names-like-dplyr 和 https://stackoverflow.com/questions/48072364/writing-a-scoped-filter-function-in-dplyr)。

我尝试了第一个帖子中建议的将函数写成带有“...”的方式,但这并没有起作用,而且很难从其他帖子中推广任何解决方案。非常有兴趣了解在使用 dplyr 时,如何正确使用符号和作用域的问题。


<details>
<summary>英文:</summary>

I am doing t-tests on variables within a dataframe:

library(rstatix)

df <- data.frame(grouping = c(rep("left", 50), rep("right", 50)),
var1 = (rnorm(100, mean=21, sd=3)))

var1_result <- df %>%
t_test(var1 ~ grouping, paired = TRUE, detailed = TRUE) %>%
rstatix::add_significance()

var1_result


I have this working with repeated lines of code for each variable, but would like to improve by calling a user-defined function instead. I tried 

my_t_test <- function(dataset, parameter, grouping_variable) {
parameter <- dataset %>% t_test({{parameter}} ~ {{grouping_variable}}, paired = TRUE, detailed = TRUE) %>% add_significance()
return(parameter)
}
my_t_test(df, var1, grouping)

However, I am encountering the error: &quot;Error in `pull()`: ! Can&#39;t extract columns that don&#39;t exist. ✖ Column `...` doesn&#39;t exist.&quot;

I found a few posts that address calling df variables within a function written in dplyr style (e.g., https://stackoverflow.com/questions/73156960/how-can-i-write-a-function-in-r-which-accepts-column-names-like-dplyr &amp; https://stackoverflow.com/questions/48072364/writing-a-scoped-filter-function-in-dplyr)

I tried the approach of writing my function with &quot;...&quot; instead as suggested by first post, but this did not work, and was having trouble generalizing any solutions from other posts. Very interested in learning more about proper notation and scoping with user-defined functions when using dplyr

</details>


# 答案1
**得分**: 0

需要在尝试将 `{{}}` 表达式放入公式中时更加小心,因为公式的左侧和右侧保持未评估状态。一个可能的解决方法是:

```R
my_t_test <- function(dataset, parameter, grouping_variable) {
  formula <- do.call("~", list(rlang::enexpr(parameter), rlang::enexpr(grouping_variable)))
  parameter <- dataset %>% t_test(formula, paired = TRUE, detailed = TRUE) %>% add_significance()
  return(parameter)
}

在这里,我们使用 ~ 函数构建公式,并使用 enexpr 来捕获适当的符号。

这应该产生相同的输出:

my_t_test(df, var1, grouping)
# A tibble: 1 × 14
  estimate .y.   group1 group2    n1    n2 stati…¹     p    df conf.…² conf.…³ method
     <dbl> <chr> <chr>  <chr>  <int> <int>   <dbl> <dbl> <dbl>   <dbl>   <dbl> <chr> 
1    0.114 var1  left   right     50    50   0.162 0.872    49   -1.30    1.53 T-test
# … with 2 more variables: alternative <chr>, p.signif <chr>, and abbreviated
#   variable names ¹​statistic, ²​conf.low, ³​conf.high
# ℹ Use `colnames()` to see all variable names

请注意,{{}} 不是标准的 R 语法,只适用于使用 rlang 作为后端的包(主要是 "tidyverse" 中的包)。恰巧 rstatix::t_test 使用 dplyr 作为后端。

英文:

You need to be a bit more careful when trying to put {{}} expressions into a formula since the left and right side of the formula are left unevaluated. One possible work around would be

my_t_test &lt;- function(dataset, parameter, grouping_variable) {
  formula &lt;- do.call(&quot;~&quot;, list(rlang::enexpr(parameter), rlang::enexpr(grouping_variable)))
  parameter &lt;- dataset %&gt;% t_test(formula, paired = TRUE, detailed = TRUE) %&gt;% add_significance()
  return(parameter)
}

Here we call the ~ function to build the formula and use enexpr to capture the appropriate symbols.

This should produce the same output

my_t_test(df, var1, grouping)
# A tibble: 1 &#215; 14
  estimate .y.   group1 group2    n1    n2 stati…&#185;     p    df conf.…&#178; conf.…&#179; method
     &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;  &lt;chr&gt;  &lt;int&gt; &lt;int&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt; &lt;chr&gt; 
1    0.114 var1  left   right     50    50   0.162 0.872    49   -1.30    1.53 T-test
# … with 2 more variables: alternative &lt;chr&gt;, p.signif &lt;chr&gt;, and abbreviated
#   variable names &#185;​statistic, &#178;​conf.low, &#179;​conf.high
# ℹ Use `colnames()` to see all variable names

Note that {{}} is not a standard R syntax and only works for packages that use rlang as a back end (mainly those in the "tidyverse"). It just so happens that rstatix::t_test happens to use dplyr in the back end

huangapple
  • 本文由 发表于 2023年7月28日 01:18:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76782108.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定