使用MICE包,如何从变量列表创建模型列表以测试glm?

huangapple go评论64阅读模式
英文:

With MICE package, how to create a list of models from a list of variables to test with glm

问题

I want to do t-test(or chi^2 test) to estimate the difference of variables between grou=0 and grou=1. All variables in the dataset are imputed by MICE. Variables include AGE,SCORE,GENDER, HEART, etc.; AGE and SCORE are continuous variables, and GENDER and HEART are categorical variables.

If the t-test is done for only one variable at a time, I know the code is:

library(MICE)
data_im <- mice(data, m=5, seed=6666)
summary(pool(with(data_im, glm(AGE~grou))))

The output p-value is also the p-value of the t-test.

However, there are too many variables I need to evaluate, thus I would like to write a for loop or create a function to output the summary test results of multiple variables at once.

I have tried to write:

vars <- c("AGE","SCORE","GENDER","HEART")
afterMICE <- c()
for(i in 1:4){
  pool_fitMICE <- pool(with(data_im, glm(substitute(y ~ grou, list(y=as.name(vars[i])))))
}

**Error in eval(predvars, data, env) : object 'AGE' not found**

afterMICE <- rbind(afterMICE, c(vars[i], coef(summary(pool_fitMICE))[2,c(1,2,4)])

I know the reason for the error is that data_im is not a regular dataframe structure.

How to modify the code to achieve batch output of summary results from different variables?

#------------------------------------------------------------------------------

Edit:
mice() is a function to do multivariate imputation for missing data. https://www.rdocumentation.org/packages/mice/versions/3.15.0/topics/mice

Take the data("nhanes2") for example and we can see the structure of data_im.

library(MICE)
data("nhanes2")
vars <- c("bmi", "chl", "age", "hyp")
catvars <- c("age", "hyp")
data_im <- mice(nhanes2, m=5, seed=6666)
pool.fits <- pool(with(data_im, glm(age~hyp)))

But we need to batch pool the results of pool(with(data_im, glm(vars~hyp))) (Variables in vars take turns being a dependent variable in glm(), with hyp as the independent variable.)

英文:

I want to do t-test(or chi^2 test) to estimate the difference of variables between grou=0 and grou=1. All variables in the dataset are imputed by MICE. Variables include AGE,SCORE,GENDER, HEART, etc.; AGE and SCORE are continuous variables, and GENDER and HEART are categorical variables.

If the t-test is done for only one variable at a time, I know the code is:

library(MICE)
data_im&lt;-mice(data, m=5,seed=6666)
summary(pool(with(data_im,glm(AGE~grou))))

The output p-value is also the p-value of the t-test.

However, there are too many variables I need to evaluate,thus I would like to write a for loop or create a function to output the summary test results of multiple variable at once.

I have tried to write:

vars &lt;- c(&quot;AGE&quot;,&quot;SCORE&quot;,&quot;GENDER&quot;,&quot;HEART&quot;)
afterMICE &lt;- c()
for(i in 1:4){
  pool_fitMICE &lt;- pool(with(data_im,glm(substitute(y ~ grou,list(y=as.name(vars[i]))))))
}

**Error in eval(predvars, data, env) : object &#39;AGE&#39; not found**

afterMICE &lt;- rbind(afterMICE,C(vars[i],coef(summary(pool_fitMICE))[2,c(1,2,4)]))

I know the reason for the error is that data_im is not a regular dataframe structure.

How to modify the code to achieve batch output of summary results from different variables?

#------------------------------------------------------------------------------

Edit:
mice() is a function to do multivariate imputation for missing data. https://www.rdocumentation.org/packages/mice/versions/3.15.0/topics/mice

Take the data(&quot;nhanes2&quot;) for example and we can see the structure of data_im.

library(MICE)
data(&quot;nhanes2&quot;)
vars=c(&quot;bmi&quot;,&quot;chl&quot;,&quot;age&quot;,&quot;hyp&quot;)
catvars=c(&quot;age&quot;,&quot;hyp&quot;)
data_im=mice(nhanes2,m=5,seed=6666)
pool.fits &lt;- pool(with(data_im, glm(age~hyp)))

But we need to batch pool the results of pool(with(data_im, glm(vars~hyp))) (Variables in vars take turns being a dependent variable in glm(), with hyp as the independent variable. )

答案1

得分: 0

用于模型动态生成公式的,您可以查看`reformulate`。`reformulate`可以帮助您避免`paste`/`substitute`/`as.name`的麻烦。我认为我们也不需要使用`with()`。这可能会导致一些环境混淆。

vars <- c("AGE","SCORE","GENDER","HEART")

list_of_formulas <- lapply(vars, (x) reformulate(termlabels = 'grou', response = x))

查看创建的公式列表:

[[1]]
AGE ~ grou
<environment: 0x557191bff8b8>

[[2]]
SCORE ~ grou
<environment: 0x557191c07b58>

[[3]]
GENDER ~ grou
<environment: 0x557191cb30f8>

[[4]]
HEART ~ grou
<environment: 0x557191ccad40>

然后在您的for循环中使用此列表。

pool_fitMICE <- list()
for(i in 1:4){
pool_fitMICE[[i]] <- pool(glm(formula = list_of_formulas[[i]], data = data_im))
}

如果我们使用一个循环函数,还可以避免初始化输出列表或使用for循环的需要:

lapply(list_of_formulas, (x) pool(glm(formula = x, data = data_im))


<details>
<summary>英文:</summary>

for dynamic generation of formulas for models, you may look into `reformulate`. `reformulate` may save you from the `paste`/`substitute`/`as.name` hell. I believe we do not need `with()`too. It may generate some environment confusion.

vars <- c("AGE","SCORE","GENDER","HEART")

list_of_formulas <- lapply(vars, (x) reformulate(termlabels = 'grou', response = x))

see the created list of formulas:

[[1]]
AGE ~ grou
<environment: 0x557191bff8b8>

[[2]]
SCORE ~ grou
<environment: 0x557191c07b58>

[[3]]
GENDER ~ grou
<environment: 0x557191cb30f8>

[[4]]
HEART ~ grou
<environment: 0x557191ccad40>

Then use this list in your for loop.

pool_fitMICE <- list()
for(i in 1:4){
pool_fitMICE[[i]] <- pool(glm(formula = list_of_formulas[[i]], data = data_im))
}

We can also obviate the need to initiate the output list or using a for loop if we use a loop-function:

lapply(list_of_formulas, (x) pool(glm(formula = x, data = data_im))

答案2

得分: 0

不需要在这种情况下使用evalsubstitute。您可以正常创建公式(例如,使用as.formula),当在with内部评估公式时,它将自动找到缺失数据中的变量。

library(mice)
set.seed(123)
vars <- c("age", "bmi", "chl")
imps <- mice(nhanes, printFlag = FALSE)
mods <- list()
for(i in seq_along(vars)) {
  mods[[i]] <- pool(with(imps, 
                     glm(as.formula(paste0(vars[[i]], "~ hyp"))))   
}
names(mods) <- vars
lapply(mods, summary)

编辑:将输出放入数据框中。

out <- data.frame(var = vars,
                  coef_int = 0,
                  coef_hyp = 0)
for(i in seq_along(vars)) {
  out[i, 2:3] <- mods[[i]]$pooled$estimate
}
out

这是您提供的代码的翻译部分。

英文:

You actually don't need to mess around with eval and substitute in this situation. You can create the formula normally (using as.formula for example) and when the formula is evaluated inside with it will automatically find the variables in the imputed data.

library(mice)
set.seed(123)
vars &lt;- c(&quot;age&quot;, &quot;bmi&quot;, &quot;chl&quot;)
imps &lt;- mice(nhanes, printFlag = FALSE)
mods &lt;- list()
for(i in seq_along(vars)) {
  mods[[i]] &lt;- pool(with(imps, 
                         glm(as.formula(paste0(vars[[i]], &quot;~ hyp&quot;)))))
}
names(mods) &lt;- vars
lapply(mods, summary)
#&gt; $age
#&gt;          term  estimate std.error statistic       df   p.value
#&gt; 1 (Intercept) 0.5216374 0.4873196  1.070422 17.71793 0.2987954
#&gt; 2         hyp 1.0163241 0.3744249  2.714360 19.13894 0.0136971
#&gt; 
#&gt; $bmi
#&gt;          term   estimate std.error   statistic       df      p.value
#&gt; 1 (Intercept) 27.0037636  2.882558  9.36798555 16.45801 5.297937e-08
#&gt; 2         hyp -0.1502389  2.191507 -0.06855506 18.35948 9.460849e-01
#&gt; 
#&gt; $chl
#&gt;          term  estimate std.error statistic       df      p.value
#&gt; 1 (Intercept) 157.88698  24.65957  6.402665 20.91373 2.445428e-06
#&gt; 2         hyp  29.06627  19.62684  1.480945 19.90150 1.542804e-01

<sup>Created on 2023-03-08 with reprex v2.0.2</sup>


Edit: Putting output into a data.frame.

out &lt;- data.frame(var = vars,
                  coef_int = 0,
                  coef_hyp = 0)
for(i in seq_along(vars)) {
  out[i, 2:3] &lt;- mods[[i]]$pooled$estimate
}
out
#&gt;   var    coef_int   coef_hyp
#&gt; 1 age   0.5216374  1.0163241
#&gt; 2 bmi  27.0037636 -0.1502389
#&gt; 3 chl 157.8869758 29.0662740

huangapple
  • 本文由 发表于 2023年2月24日 01:08:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548093.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定