运行 sapply 函数,其中有两个输入(变量和数据框)。

huangapple go评论84阅读模式

Run sapply function with 2 inputs (variable and dataframe)


你的问题似乎是如何修改代码以处理多个变量进行 t-检验,并将结果合并到一个表中。你提到的问题可能与 sapply 函数的输入有关,因为它期望一个数据框,而你的 var_list 是一个变量列表。要解决这个问题,你可以使用 lapply 来循环遍历变量列表,并在每次迭代中进行 t-检验。以下是修改后的代码:

df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl", "drat", "disp")

multiple_wt_ttest <- function(df, var) {
  ttest <- wtd.t.test(x = subset(df, am == 0)[[var]], y = subset(df, am == 1)[[var]],
                      weight = subset(df, am == 0)$wt, weighty = subset(df, am == 1)$wt,
                      samedata = FALSE)

results <- lapply(var_list, function(var) {
  store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, var = var))
  colnames(store) <- var

final_result <- do.call(cbind, results)

这将循环遍历你的变量列表 var_list,对每个变量执行 t-检验,并将结果存储在一个列表中。最后,使用 do.callcbind 将所有结果按列合并到一个表格中。



I am running a function to perform weighted two-sample t-tests on multiple subsets of a dataframe. A reproducible version of my code (using the mtcars dataset) is the following:

df_list &lt;- split(mtcars, mtcars$carb)
multiple_wt_ttest &lt;- function(df) {ttest = wtd.t.test(x=subset(df, am == 0)$disp,y=subset(df, am == 1)$disp,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
 out &lt;&lt;- ttest[2]}

store &lt;- do.call(rbind, sapply(df_list, multiple_wt_ttest))

Which yields a dataframe displaying the desired t-test for each subset of the mtcars based on the variable carb. Now, I want to repeat this, not just for comparing the variable disp but for multiple variables in the dataframe (in mtcars, for example, drat,cyl,gear,etc. The formula would therefore be something like the following:

df_list &lt;- split(mtcars, mtcars$carb)
var_list &lt;- list(&quot;cyl&quot;,&quot;drat&quot;,&quot;disp&quot;)
multiple_wt_ttest &lt;- function(df,var) {ttest = wtd.t.test(x=subset(df, am == 0)$var,y=subset(df, am == 1)$var,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
 out &lt;&lt;- ttest[2]}

store &lt;- do.call(rbind, sapply(df_list,var=var_list, multiple_wt_ttest))

But this does not work and yields the error:
Error in var(x) : &#39;x&#39; is NULL

I think this has to do with the fact that the original sapply is providing a dataframe, whereas the new var_list is a vector/list of variables. How, then can I combine 2 different inputs in my sapply function to repeat this process of t-tests on each subset of the data, for multiple variables (instead of just one) and compile the results next to each other in a table?


得分: 2


首先,我已经更正了该函数,使其能够处理仅包含一个 am 值的输入数据框,例如仅包含一行数据的数据。

然后,在变量列表上使用 lapply 循环调用针对单个变量运行的代码。

#&gt; 加载所需的包: Hmisc
#&gt; 附加包: &#39;Hmisc&#39;
#&gt; 下面的对象来自&#39;package:base&#39;:
#&gt;     format.pval, units

multiple_wt_ttest <- function(df, target_var) {
  i0 <- df$am == 0
  i1 <- df$am == 1
  if(any(i0) && any(i1)) {
    ttest <- wtd.t.test(
      x = df[[target_var]][i0],
      y = df[[target_var]][i1],
      weight = df$wt[i0],
      weighty = df$wt[i1],
      samedata = FALSE
  } else NULL

df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")

results_list <- lapply(var_list, \(v) {
  store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
  store <- as.data.frame(store)
  store$variable <- v
  store[c(4, 1:3)]

do.call(rbind, results_list)
#&gt;    variable   t.value       df     p.value
#&gt; 1       cyl  2.327192 2.000000 0.145420369
#&gt; 2       cyl  3.351162 5.000000 0.020303028
#&gt; 4       cyl  1.068365 3.070500 0.362061152
#&gt; 11     drat -3.335558 2.345842 0.063563101
#&gt; 21     drat -3.633611 6.293180 0.010048620
#&gt; 41     drat -3.455307 7.778648 0.009008048
#&gt; 12     disp  3.069880 2.183101 0.082230383
#&gt; 22     disp  3.897422 5.560369 0.009295961
#&gt; 42     disp  1.697305 4.282223 0.160142699

创建于2023年5月26日,使用 reprex v2.0.2


Here is a solution.

First of all, I have corrected the function so that it can cope with input data.frames with only one am value, such as data with only one row.
Then, call the code that runs for one variable in a lapply loop on the variables list.

#&gt; Loading required package: Hmisc
#&gt; Attaching package: &#39;Hmisc&#39;
#&gt; The following objects are masked from &#39;package:base&#39;:
#&gt;     format.pval, units

multiple_wt_ttest &lt;- function(df, target_var) {
  i0 &lt;- df$am == 0
  i1 &lt;- df$am == 1
  if(any(i0) &amp;&amp; any(i1)) {
    ttest &lt;- wtd.t.test(
      x = df[[target_var]][i0],
      y = df[[target_var]][i1],
      weight = df$wt[i0],
      weighty = df$wt[i1],
      samedata = FALSE
  } else NULL

df_list &lt;- split(mtcars, mtcars$carb)
var_list &lt;- list(&quot;cyl&quot;,&quot;drat&quot;,&quot;disp&quot;)

results_list &lt;- lapply(var_list, \(v) {
  store &lt;- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
  store &lt;- as.data.frame(store)
  store$variable &lt;- v
  store[c(4, 1:3)]

do.call(rbind, results_list)
#&gt;    variable   t.value       df     p.value
#&gt; 1       cyl  2.327192 2.000000 0.145420369
#&gt; 2       cyl  3.351162 5.000000 0.020303028
#&gt; 4       cyl  1.068365 3.070500 0.362061152
#&gt; 11     drat -3.335558 2.345842 0.063563101
#&gt; 21     drat -3.633611 6.293180 0.010048620
#&gt; 41     drat -3.455307 7.778648 0.009008048
#&gt; 12     disp  3.069880 2.183101 0.082230383
#&gt; 22     disp  3.897422 5.560369 0.009295961
#&gt; 42     disp  1.697305 4.282223 0.160142699

<sup>Created on 2023-05-26 with reprex v2.0.2</sup>

  • 本文由 发表于 2023年5月26日 11:16:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337436.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
