2023年2月24日 01:08:02go评论114阅读模式

英文:

With MICE package, how to create a list of models from a list of variables to test with glm

问题

I want to do t-test（or chi^2 test） to estimate the difference of variables between grou=0 and grou=1. All variables in the dataset are imputed by MICE. Variables include AGE,SCORE,GENDER, HEART, etc.; AGE and SCORE are continuous variables, and GENDER and HEART are categorical variables.

If the t-test is done for only one variable at a time, I know the code is：

library(MICE)
data_im <- mice(data, m=5, seed=6666)
summary(pool(with(data_im, glm(AGE~grou))))

The output p-value is also the p-value of the t-test.

However, there are too many variables I need to evaluate, thus I would like to write a for loop or create a function to output the summary test results of multiple variables at once.

I have tried to write:

vars <- c("AGE","SCORE","GENDER","HEART")
afterMICE <- c()
for(i in 1:4){
  pool_fitMICE <- pool(with(data_im, glm(substitute(y ~ grou, list(y=as.name(vars[i])))))
}
**Error in eval(predvars, data, env) : object 'AGE' not found**
afterMICE <- rbind(afterMICE, c(vars[i], coef(summary(pool_fitMICE))[2,c(1,2,4)])

I know the reason for the error is that data_im is not a regular dataframe structure.

How to modify the code to achieve batch output of summary results from different variables？

#------------------------------------------------------------------------------

Edit:
mice() is a function to do multivariate imputation for missing data. https://www.rdocumentation.org/packages/mice/versions/3.15.0/topics/mice

Take the data("nhanes2") for example and we can see the structure of data_im.

library(MICE)
data("nhanes2")
vars <- c("bmi", "chl", "age", "hyp")
catvars <- c("age", "hyp")
data_im <- mice(nhanes2, m=5, seed=6666)
pool.fits <- pool(with(data_im, glm(age~hyp)))

But we need to batch pool the results of pool(with(data_im, glm(vars~hyp))) (Variables in vars take turns being a dependent variable in glm(), with hyp as the independent variable.)

英文:

If the t-test is done for only one variable at a time, I know the code is：

library(MICE)
data_im&lt;-mice(data, m=5,seed=6666)
summary(pool(with(data_im,glm(AGE~grou))))

The output p-value is also the p-value of the t-test.

However, there are too many variables I need to evaluate，thus I would like to write a for loop or create a function to output the summary test results of multiple variable at once.

I have tried to write:

vars &lt;- c(&quot;AGE&quot;,&quot;SCORE&quot;,&quot;GENDER&quot;,&quot;HEART&quot;)
afterMICE &lt;- c()
for(i in 1:4){
  pool_fitMICE &lt;- pool(with(data_im,glm(substitute(y ~ grou,list(y=as.name(vars[i]))))))
}
**Error in eval(predvars, data, env) : object &#39;AGE&#39; not found**
afterMICE &lt;- rbind(afterMICE,C(vars[i],coef(summary(pool_fitMICE))[2,c(1,2,4)]))

I know the reason for the error is that data_im is not a regular dataframe structure.

How to modify the code to achieve batch output of summary results from different variables？

#------------------------------------------------------------------------------

Edit:
mice() is a function to do multivariate imputation for missing data. https://www.rdocumentation.org/packages/mice/versions/3.15.0/topics/mice

Take the data("nhanes2") for example and we can see the structure of data_im.

library(MICE)
data(&quot;nhanes2&quot;)
vars=c(&quot;bmi&quot;,&quot;chl&quot;,&quot;age&quot;,&quot;hyp&quot;)
catvars=c(&quot;age&quot;,&quot;hyp&quot;)
data_im=mice(nhanes2,m=5,seed=6666)
pool.fits &lt;- pool(with(data_im, glm(age~hyp)))

But we need to batch pool the results of pool(with(data_im, glm(vars~hyp))) (Variables in vars take turns being a dependent variable in glm(), with hyp as the independent variable. )

答案1

得分: 0

用于模型动态生成公式的，您可以查看`reformulate`。`reformulate`可以帮助您避免`paste`/`substitute`/`as.name`的麻烦。我认为我们也不需要使用`with()`。这可能会导致一些环境混淆。

vars <- c("AGE","SCORE","GENDER","HEART")

list_of_formulas <- lapply(vars, (x) reformulate(termlabels = 'grou', response = x))

查看创建的公式列表：

[[1]]
AGE ~ grou
<environment: 0x557191bff8b8>

[[2]]
SCORE ~ grou
<environment: 0x557191c07b58>

[[3]]
GENDER ~ grou
<environment: 0x557191cb30f8>

[[4]]
HEART ~ grou
<environment: 0x557191ccad40>

然后在您的for循环中使用此列表。

pool_fitMICE <- list()
for(i in 1:4){
pool_fitMICE[[i]] <- pool(glm(formula = list_of_formulas[[i]], data = data_im))
}

如果我们使用一个循环函数，还可以避免初始化输出列表或使用for循环的需要：

lapply(list_of_formulas, (x) pool(glm(formula = x, data = data_im))


<details>
<summary>英文:</summary>
for dynamic generation of formulas for models, you may look into `reformulate`. `reformulate` may save you from the `paste`/`substitute`/`as.name` hell. I believe we do not need `with()`too. It may generate some environment confusion.

vars <- c("AGE","SCORE","GENDER","HEART")

list_of_formulas <- lapply(vars, (x) reformulate(termlabels = 'grou', response = x))

see the created list of formulas:

[[1]]
AGE ~ grou
<environment: 0x557191bff8b8>

[[2]]
SCORE ~ grou
<environment: 0x557191c07b58>

[[3]]
GENDER ~ grou
<environment: 0x557191cb30f8>

[[4]]
HEART ~ grou
<environment: 0x557191ccad40>

Then use this list in your for loop.

pool_fitMICE <- list()
for(i in 1:4){
pool_fitMICE[[i]] <- pool(glm(formula = list_of_formulas[[i]], data = data_im))
}

We can also obviate the need to initiate the output list or using a for loop if we use a loop-function:

lapply(list_of_formulas, (x) pool(glm(formula = x, data = data_im))

答案2

得分: 0

不需要在这种情况下使用eval和substitute。您可以正常创建公式（例如，使用as.formula），当在with内部评估公式时，它将自动找到缺失数据中的变量。

library(mice)
set.seed(123)
vars <- c("age", "bmi", "chl")
imps <- mice(nhanes, printFlag = FALSE)
mods <- list()
for(i in seq_along(vars)) {
  mods[[i]] <- pool(with(imps, 
                     glm(as.formula(paste0(vars[[i]], "~ hyp"))))   
}
names(mods) <- vars
lapply(mods, summary)

编辑：将输出放入数据框中。

out <- data.frame(var = vars,
                  coef_int = 0,
                  coef_hyp = 0)
for(i in seq_along(vars)) {
  out[i, 2:3] <- mods[[i]]$pooled$estimate
}
out

这是您提供的代码的翻译部分。

英文:

You actually don't need to mess around with eval and substitute in this situation. You can create the formula normally (using as.formula for example) and when the formula is evaluated inside with it will automatically find the variables in the imputed data.

library(mice)
set.seed(123)
vars &lt;- c(&quot;age&quot;, &quot;bmi&quot;, &quot;chl&quot;)
imps &lt;- mice(nhanes, printFlag = FALSE)
mods &lt;- list()
for(i in seq_along(vars)) {
  mods[[i]] &lt;- pool(with(imps, 
                         glm(as.formula(paste0(vars[[i]], &quot;~ hyp&quot;)))))
}
names(mods) &lt;- vars
lapply(mods, summary)
#&gt; $age
#&gt;          term  estimate std.error statistic       df   p.value
#&gt; 1 (Intercept) 0.5216374 0.4873196  1.070422 17.71793 0.2987954
#&gt; 2         hyp 1.0163241 0.3744249  2.714360 19.13894 0.0136971
#&gt; 
#&gt; $bmi
#&gt;          term   estimate std.error   statistic       df      p.value
#&gt; 1 (Intercept) 27.0037636  2.882558  9.36798555 16.45801 5.297937e-08
#&gt; 2         hyp -0.1502389  2.191507 -0.06855506 18.35948 9.460849e-01
#&gt; 
#&gt; $chl
#&gt;          term  estimate std.error statistic       df      p.value
#&gt; 1 (Intercept) 157.88698  24.65957  6.402665 20.91373 2.445428e-06
#&gt; 2         hyp  29.06627  19.62684  1.480945 19.90150 1.542804e-01

<sup>Created on 2023-03-08 with reprex v2.0.2</sup>

Edit: Putting output into a data.frame.

out &lt;- data.frame(var = vars,
                  coef_int = 0,
                  coef_hyp = 0)
for(i in seq_along(vars)) {
  out[i, 2:3] &lt;- mods[[i]]$pooled$estimate
}
out
#&gt;   var    coef_int   coef_hyp
#&gt; 1 age   0.5216374  1.0163241
#&gt; 2 bmi  27.0037636 -0.1502389
#&gt; 3 chl 157.8869758 29.0662740

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用MICE包，如何从变量列表创建模型列表以测试glm？

问题

答案1

答案2

R: 在区间内计数观测值

Python Web-Scraping 代码仅在循环中返回第一个迭代。

如何计算每个组的百分比取决于不同的变量？

如何使程序在用户未输入任何内容时执行某项操作

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。