英文:
With MICE package, how to create a list of models from a list of variables to test with glm
问题
I want to do t-test(or chi^2 test) to estimate the difference of variables between grou=0
and grou=1
. All variables in the dataset are imputed by MICE. Variables include AGE
,SCORE
,GENDER
, HEART
, etc.; AGE
and SCORE
are continuous variables, and GENDER
and HEART
are categorical variables.
If the t-test is done for only one variable at a time, I know the code is:
library(MICE)
data_im <- mice(data, m=5, seed=6666)
summary(pool(with(data_im, glm(AGE~grou))))
The output p-value
is also the p-value
of the t-test.
However, there are too many variables I need to evaluate, thus I would like to write a for loop or create a function to output the summary test results of multiple variables at once.
I have tried to write:
vars <- c("AGE","SCORE","GENDER","HEART")
afterMICE <- c()
for(i in 1:4){
pool_fitMICE <- pool(with(data_im, glm(substitute(y ~ grou, list(y=as.name(vars[i])))))
}
**Error in eval(predvars, data, env) : object 'AGE' not found**
afterMICE <- rbind(afterMICE, c(vars[i], coef(summary(pool_fitMICE))[2,c(1,2,4)])
I know the reason for the error is that data_im
is not a regular dataframe structure.
How to modify the code to achieve batch output of summary results from different variables?
#------------------------------------------------------------------------------
Edit:
mice()
is a function to do multivariate imputation for missing data. https://www.rdocumentation.org/packages/mice/versions/3.15.0/topics/mice
Take the data("nhanes2")
for example and we can see the structure of data_im
.
library(MICE)
data("nhanes2")
vars <- c("bmi", "chl", "age", "hyp")
catvars <- c("age", "hyp")
data_im <- mice(nhanes2, m=5, seed=6666)
pool.fits <- pool(with(data_im, glm(age~hyp)))
But we need to batch pool the results of pool(with(data_im, glm(vars~hyp)))
(Variables in vars
take turns being a dependent variable in glm()
, with hyp
as the independent variable.)
英文:
I want to do t-test(or chi^2 test) to estimate the difference of variables between grou=0
and grou=1
. All variables in the dataset are imputed by MICE. Variables include AGE
,SCORE
,GENDER
, HEART
, etc.; AGE
and SCORE
are continuous variables, and GENDER
and HEART
are categorical variables.
If the t-test is done for only one variable at a time, I know the code is:
library(MICE)
data_im<-mice(data, m=5,seed=6666)
summary(pool(with(data_im,glm(AGE~grou))))
The output p-value
is also the p-value
of the t-test.
However, there are too many variables I need to evaluate,thus I would like to write a for loop or create a function to output the summary test results of multiple variable at once.
I have tried to write:
vars <- c("AGE","SCORE","GENDER","HEART")
afterMICE <- c()
for(i in 1:4){
pool_fitMICE <- pool(with(data_im,glm(substitute(y ~ grou,list(y=as.name(vars[i]))))))
}
**Error in eval(predvars, data, env) : object 'AGE' not found**
afterMICE <- rbind(afterMICE,C(vars[i],coef(summary(pool_fitMICE))[2,c(1,2,4)]))
I know the reason for the error is that data_im
is not a regular dataframe structure.
How to modify the code to achieve batch output of summary results from different variables?
#------------------------------------------------------------------------------
Edit:
mice()
is a function to do multivariate imputation for missing data. https://www.rdocumentation.org/packages/mice/versions/3.15.0/topics/mice
Take the data("nhanes2")
for example and we can see the structure of data_im
.
library(MICE)
data("nhanes2")
vars=c("bmi","chl","age","hyp")
catvars=c("age","hyp")
data_im=mice(nhanes2,m=5,seed=6666)
pool.fits <- pool(with(data_im, glm(age~hyp)))
But we need to batch pool the results of pool(with(data_im, glm(vars~hyp)))
(Variables in vars
take turns being a dependent variable in glm()
, with hyp
as the independent variable. )
答案1
得分: 0
用于模型动态生成公式的,您可以查看`reformulate`。`reformulate`可以帮助您避免`paste`/`substitute`/`as.name`的麻烦。我认为我们也不需要使用`with()`。这可能会导致一些环境混淆。
vars <- c("AGE","SCORE","GENDER","HEART")
list_of_formulas <- lapply(vars, (x) reformulate(termlabels = 'grou', response = x))
查看创建的公式列表:
[[1]]
AGE ~ grou
<environment: 0x557191bff8b8>
[[2]]
SCORE ~ grou
<environment: 0x557191c07b58>
[[3]]
GENDER ~ grou
<environment: 0x557191cb30f8>
[[4]]
HEART ~ grou
<environment: 0x557191ccad40>
然后在您的for循环中使用此列表。
pool_fitMICE <- list()
for(i in 1:4){
pool_fitMICE[[i]] <- pool(glm(formula = list_of_formulas[[i]], data = data_im))
}
如果我们使用一个循环函数,还可以避免初始化输出列表或使用for循环的需要:
lapply(list_of_formulas, (x) pool(glm(formula = x, data = data_im))
<details>
<summary>英文:</summary>
for dynamic generation of formulas for models, you may look into `reformulate`. `reformulate` may save you from the `paste`/`substitute`/`as.name` hell. I believe we do not need `with()`too. It may generate some environment confusion.
vars <- c("AGE","SCORE","GENDER","HEART")
list_of_formulas <- lapply(vars, (x) reformulate(termlabels = 'grou', response = x))
see the created list of formulas:
[[1]]
AGE ~ grou
<environment: 0x557191bff8b8>
[[2]]
SCORE ~ grou
<environment: 0x557191c07b58>
[[3]]
GENDER ~ grou
<environment: 0x557191cb30f8>
[[4]]
HEART ~ grou
<environment: 0x557191ccad40>
Then use this list in your for loop.
pool_fitMICE <- list()
for(i in 1:4){
pool_fitMICE[[i]] <- pool(glm(formula = list_of_formulas[[i]], data = data_im))
}
We can also obviate the need to initiate the output list or using a for loop if we use a loop-function:
lapply(list_of_formulas, (x) pool(glm(formula = x, data = data_im))
答案2
得分: 0
不需要在这种情况下使用eval
和substitute
。您可以正常创建公式(例如,使用as.formula
),当在with
内部评估公式时,它将自动找到缺失数据中的变量。
library(mice)
set.seed(123)
vars <- c("age", "bmi", "chl")
imps <- mice(nhanes, printFlag = FALSE)
mods <- list()
for(i in seq_along(vars)) {
mods[[i]] <- pool(with(imps,
glm(as.formula(paste0(vars[[i]], "~ hyp"))))
}
names(mods) <- vars
lapply(mods, summary)
编辑:将输出放入数据框中。
out <- data.frame(var = vars,
coef_int = 0,
coef_hyp = 0)
for(i in seq_along(vars)) {
out[i, 2:3] <- mods[[i]]$pooled$estimate
}
out
这是您提供的代码的翻译部分。
英文:
You actually don't need to mess around with eval
and substitute
in this situation. You can create the formula normally (using as.formula
for example) and when the formula is evaluated inside with
it will automatically find the variables in the imputed data.
library(mice)
set.seed(123)
vars <- c("age", "bmi", "chl")
imps <- mice(nhanes, printFlag = FALSE)
mods <- list()
for(i in seq_along(vars)) {
mods[[i]] <- pool(with(imps,
glm(as.formula(paste0(vars[[i]], "~ hyp")))))
}
names(mods) <- vars
lapply(mods, summary)
#> $age
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 0.5216374 0.4873196 1.070422 17.71793 0.2987954
#> 2 hyp 1.0163241 0.3744249 2.714360 19.13894 0.0136971
#>
#> $bmi
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 27.0037636 2.882558 9.36798555 16.45801 5.297937e-08
#> 2 hyp -0.1502389 2.191507 -0.06855506 18.35948 9.460849e-01
#>
#> $chl
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 157.88698 24.65957 6.402665 20.91373 2.445428e-06
#> 2 hyp 29.06627 19.62684 1.480945 19.90150 1.542804e-01
<sup>Created on 2023-03-08 with reprex v2.0.2</sup>
Edit: Putting output into a data.frame.
out <- data.frame(var = vars,
coef_int = 0,
coef_hyp = 0)
for(i in seq_along(vars)) {
out[i, 2:3] <- mods[[i]]$pooled$estimate
}
out
#> var coef_int coef_hyp
#> 1 age 0.5216374 1.0163241
#> 2 bmi 27.0037636 -0.1502389
#> 3 chl 157.8869758 29.0662740
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论