英文:
Run sapply function with 2 inputs (variable and dataframe)
问题
你的问题似乎是如何修改代码以处理多个变量进行 t-检验,并将结果合并到一个表中。你提到的问题可能与 sapply
函数的输入有关,因为它期望一个数据框,而你的 var_list
是一个变量列表。要解决这个问题,你可以使用 lapply
来循环遍历变量列表,并在每次迭代中进行 t-检验。以下是修改后的代码:
library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl", "drat", "disp")
multiple_wt_ttest <- function(df, var) {
ttest <- wtd.t.test(x = subset(df, am == 0)[[var]], y = subset(df, am == 1)[[var]],
weight = subset(df, am == 0)$wt, weighty = subset(df, am == 1)$wt,
samedata = FALSE)
return(ttest$statistic)
}
results <- lapply(var_list, function(var) {
store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, var = var))
colnames(store) <- var
return(store)
})
final_result <- do.call(cbind, results)
这将循环遍历你的变量列表 var_list
,对每个变量执行 t-检验,并将结果存储在一个列表中。最后,使用 do.call
和 cbind
将所有结果按列合并到一个表格中。
请注意,你可以根据需要修改结果的命名和格式。希望这对你有所帮助!如果还有其他问题,请随时提出。
英文:
I am running a function to perform weighted two-sample t-tests on multiple subsets of a dataframe. A reproducible version of my code (using the mtcars dataset) is the following:
library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
multiple_wt_ttest <- function(df) {ttest = wtd.t.test(x=subset(df, am == 0)$disp,y=subset(df, am == 1)$disp,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
out <<- ttest[2]}
store <- do.call(rbind, sapply(df_list, multiple_wt_ttest))
Which yields a dataframe displaying the desired t-test for each subset of the mtcars based on the variable carb
. Now, I want to repeat this, not just for comparing the variable disp
but for multiple variables in the dataframe (in mtcars, for example, drat
,cyl
,gear
,etc. The formula would therefore be something like the following:
library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")
multiple_wt_ttest <- function(df,var) {ttest = wtd.t.test(x=subset(df, am == 0)$var,y=subset(df, am == 1)$var,
weight=subset(df, am == 0)$wt,weighty=subset(df, am == 1)$wt,samedata=FALSE)
out <<- ttest[2]}
store <- do.call(rbind, sapply(df_list,var=var_list, multiple_wt_ttest))
But this does not work and yields the error:
Error in var(x) : 'x' is NULL
I think this has to do with the fact that the original sapply
is providing a dataframe, whereas the new var_list
is a vector/list of variables. How, then can I combine 2 different inputs in my sapply
function to repeat this process of t-tests on each subset of the data, for multiple variables (instead of just one) and compile the results next to each other in a table?
答案1
得分: 2
以下是翻译好的部分:
首先,我已经更正了该函数,使其能够处理仅包含一个 am
值的输入数据框,例如仅包含一行数据的数据。
然后,在变量列表上使用 lapply
循环调用针对单个变量运行的代码。
library(weights)
#> 加载所需的包: Hmisc
#>
#> 附加包: 'Hmisc'
#> 下面的对象来自'package:base':
#>
#> format.pval, units
multiple_wt_ttest <- function(df, target_var) {
i0 <- df$am == 0
i1 <- df$am == 1
if(any(i0) && any(i1)) {
ttest <- wtd.t.test(
x = df[[target_var]][i0],
y = df[[target_var]][i1],
weight = df$wt[i0],
weighty = df$wt[i1],
samedata = FALSE
)
ttest[[2]]
} else NULL
}
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")
results_list <- lapply(var_list, \(v) {
store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
store <- as.data.frame(store)
store$variable <- v
store[c(4, 1:3)]
})
do.call(rbind, results_list)
#> variable t.value df p.value
#> 1 cyl 2.327192 2.000000 0.145420369
#> 2 cyl 3.351162 5.000000 0.020303028
#> 4 cyl 1.068365 3.070500 0.362061152
#> 11 drat -3.335558 2.345842 0.063563101
#> 21 drat -3.633611 6.293180 0.010048620
#> 41 drat -3.455307 7.778648 0.009008048
#> 12 disp 3.069880 2.183101 0.082230383
#> 22 disp 3.897422 5.560369 0.009295961
#> 42 disp 1.697305 4.282223 0.160142699
创建于2023年5月26日,使用 reprex v2.0.2
英文:
Here is a solution.
First of all, I have corrected the function so that it can cope with input data.frames with only one am
value, such as data with only one row.
Then, call the code that runs for one variable in a lapply
loop on the variables list.
library(weights)
#> Loading required package: Hmisc
#>
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#>
#> format.pval, units
multiple_wt_ttest <- function(df, target_var) {
i0 <- df$am == 0
i1 <- df$am == 1
if(any(i0) && any(i1)) {
ttest <- wtd.t.test(
x = df[[target_var]][i0],
y = df[[target_var]][i1],
weight = df$wt[i0],
weighty = df$wt[i1],
samedata = FALSE
)
ttest[[2]]
} else NULL
}
df_list <- split(mtcars, mtcars$carb)
var_list <- list("cyl","drat","disp")
results_list <- lapply(var_list, \(v) {
store <- do.call(rbind, sapply(df_list, multiple_wt_ttest, target_var = v))
store <- as.data.frame(store)
store$variable <- v
store[c(4, 1:3)]
})
do.call(rbind, results_list)
#> variable t.value df p.value
#> 1 cyl 2.327192 2.000000 0.145420369
#> 2 cyl 3.351162 5.000000 0.020303028
#> 4 cyl 1.068365 3.070500 0.362061152
#> 11 drat -3.335558 2.345842 0.063563101
#> 21 drat -3.633611 6.293180 0.010048620
#> 41 drat -3.455307 7.778648 0.009008048
#> 12 disp 3.069880 2.183101 0.082230383
#> 22 disp 3.897422 5.560369 0.009295961
#> 42 disp 1.697305 4.282223 0.160142699
<sup>Created on 2023-05-26 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论