运行 sapply 函数,其中有两个输入(变量和数据框)。

huangapple go评论74阅读模式
英文:

Run sapply function with 2 inputs (variable and dataframe)

问题

你的问题似乎是如何修改代码以处理多个变量进行 t-检验,并将结果合并到一个表中。你提到的问题可能与 sapply 函数的输入有关,因为它期望一个数据框,而你的 var_list 是一个变量列表。要解决这个问题,你可以使用 lapply 来循环遍历变量列表,并在每次迭代中进行 t-检验。以下是修改后的代码:

library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl", "drat", "disp")

multiple_wt_ttest <- function(df, var) {
  ttest <- wtd.t.test(x = subset(df, am == 0)[[var]], y = subset(df, am == 1)[[var]],
                      weight = subset(df, am == 0)$wt, weighty = subset(df, am == 1)$wt,
                      samedata = FALSE)
  return(ttest$statistic)
}

results <- lapply(var_list, function(var) {
  store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, var = var))
  colnames(store) <- var
  return(store)
})

final_result <- do.call(cbind, results)

这将循环遍历你的变量列表 var_list,对每个变量执行 t-检验,并将结果存储在一个列表中。最后,使用 do.callcbind 将所有结果按列合并到一个表格中。

请注意,你可以根据需要修改结果的命名和格式。希望这对你有所帮助!如果还有其他问题,请随时提出。

英文:

I am running a function to perform weighted two-sample t-tests on multiple subsets of a dataframe. A reproducible version of my code (using the mtcars dataset) is the following:

library(tidyverse)
library(weights)
df_list &lt;- split(mtcars, mtcars$carb)
multiple_wt_ttest &lt;- function(df) {ttest = wtd.t.test(x=subset(df, am == 0)$disp,y=subset(df, am == 1)$disp,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
 out &lt;&lt;- ttest[2]}

store &lt;- do.call(rbind, sapply(df_list, multiple_wt_ttest))

Which yields a dataframe displaying the desired t-test for each subset of the mtcars based on the variable carb. Now, I want to repeat this, not just for comparing the variable disp but for multiple variables in the dataframe (in mtcars, for example, drat,cyl,gear,etc. The formula would therefore be something like the following:

library(tidyverse)
library(weights)
df_list &lt;- split(mtcars, mtcars$carb)
var_list &lt;- list(&quot;cyl&quot;,&quot;drat&quot;,&quot;disp&quot;)
multiple_wt_ttest &lt;- function(df,var) {ttest = wtd.t.test(x=subset(df, am == 0)$var,y=subset(df, am == 1)$var,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
 out &lt;&lt;- ttest[2]}

store &lt;- do.call(rbind, sapply(df_list,var=var_list, multiple_wt_ttest))

But this does not work and yields the error:
Error in var(x) : &#39;x&#39; is NULL

I think this has to do with the fact that the original sapply is providing a dataframe, whereas the new var_list is a vector/list of variables. How, then can I combine 2 different inputs in my sapply function to repeat this process of t-tests on each subset of the data, for multiple variables (instead of just one) and compile the results next to each other in a table?

答案1

得分: 2

以下是翻译好的部分:

首先,我已经更正了该函数,使其能够处理仅包含一个 am 值的输入数据框,例如仅包含一行数据的数据。

然后,在变量列表上使用 lapply 循环调用针对单个变量运行的代码。

library(weights)
#&gt; 加载所需的包: Hmisc
#&gt; 
#&gt; 附加包: &#39;Hmisc&#39;
#&gt; 下面的对象来自&#39;package:base&#39;:
#&gt; 
#&gt;     format.pval, units

multiple_wt_ttest <- function(df, target_var) {
  i0 <- df$am == 0
  i1 <- df$am == 1
  if(any(i0) && any(i1)) {
    ttest <- wtd.t.test(
      x = df[[target_var]][i0],
      y = df[[target_var]][i1],
      weight = df$wt[i0],
      weighty = df$wt[i1],
      samedata = FALSE
    )
    ttest[[2]]
  } else NULL
}

df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")

results_list <- lapply(var_list, \(v) {
  store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
  store <- as.data.frame(store)
  store$variable <- v
  store[c(4, 1:3)]
})

do.call(rbind, results_list)
#&gt;    variable   t.value       df     p.value
#&gt; 1       cyl  2.327192 2.000000 0.145420369
#&gt; 2       cyl  3.351162 5.000000 0.020303028
#&gt; 4       cyl  1.068365 3.070500 0.362061152
#&gt; 11     drat -3.335558 2.345842 0.063563101
#&gt; 21     drat -3.633611 6.293180 0.010048620
#&gt; 41     drat -3.455307 7.778648 0.009008048
#&gt; 12     disp  3.069880 2.183101 0.082230383
#&gt; 22     disp  3.897422 5.560369 0.009295961
#&gt; 42     disp  1.697305 4.282223 0.160142699

创建于2023年5月26日,使用 reprex v2.0.2

英文:

Here is a solution.

First of all, I have corrected the function so that it can cope with input data.frames with only one am value, such as data with only one row.
Then, call the code that runs for one variable in a lapply loop on the variables list.

library(weights)
#&gt; Loading required package: Hmisc
#&gt; 
#&gt; Attaching package: &#39;Hmisc&#39;
#&gt; The following objects are masked from &#39;package:base&#39;:
#&gt; 
#&gt;     format.pval, units

multiple_wt_ttest &lt;- function(df, target_var) {
  i0 &lt;- df$am == 0
  i1 &lt;- df$am == 1
  if(any(i0) &amp;&amp; any(i1)) {
    ttest &lt;- wtd.t.test(
      x = df[[target_var]][i0],
      y = df[[target_var]][i1],
      weight = df$wt[i0],
      weighty = df$wt[i1],
      samedata = FALSE
    )
    ttest[[2]]
  } else NULL
}

df_list &lt;- split(mtcars, mtcars$carb)
var_list &lt;- list(&quot;cyl&quot;,&quot;drat&quot;,&quot;disp&quot;)

results_list &lt;- lapply(var_list, \(v) {
  store &lt;- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
  store &lt;- as.data.frame(store)
  store$variable &lt;- v
  store[c(4, 1:3)]
})

do.call(rbind, results_list)
#&gt;    variable   t.value       df     p.value
#&gt; 1       cyl  2.327192 2.000000 0.145420369
#&gt; 2       cyl  3.351162 5.000000 0.020303028
#&gt; 4       cyl  1.068365 3.070500 0.362061152
#&gt; 11     drat -3.335558 2.345842 0.063563101
#&gt; 21     drat -3.633611 6.293180 0.010048620
#&gt; 41     drat -3.455307 7.778648 0.009008048
#&gt; 12     disp  3.069880 2.183101 0.082230383
#&gt; 22     disp  3.897422 5.560369 0.009295961
#&gt; 42     disp  1.697305 4.282223 0.160142699

<sup>Created on 2023-05-26 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月26日 11:16:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337436.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定