2023年6月6日 00:26:26go评论105阅读模式

英文:

How do I loop through a list of formulas in GEE model in R?

问题

I am trying to run a set of GEE models. However, each model takes about an hour to an hour and a half to run. This wouldnt normally be an issue, but given the amount of models i am trying to run, I am currently trying to implement parallel processing of these GEE models in hopes I can cut down on the overall run time.

In my attempts to do so however, I am facing an error that I believe is related to the gee model function I am using itself, geeglm() from the geepack package. The error states:

Incompatible types (from language to character) in subassignment fix

Based on this other post, it appears that the error is coming from the formula is text and would be fixed if I just typed the formula outside of "" as normal:

however I am trying to loop through a list of formulas. Other posts I've seen involve looping through a list of variables in a glm, not necessarily the entire formula.

Here is my code:

library(parallel)
library(doParallel)
library(tidyverse)
library(geepack)
dt.model <- dt_long_cat
# Declare number of cluster to use
## Warning: more does NOT mean faster!!! As the objects and data sets need to be copied to each cluster!!
cl <- makeCluster(10)
# All packages must be loaded in the parallel environment
clusterEvalQ(cl, {
   library(tidyverse)
   library(geepack)
 })
# export all objects to parallel environment - this is the slowest step
## if you have a large dataset they need to be exported to each cluster
clusterExport(cl
              , varlist=c("dt.model")
              , envir=environment())
models.to.run <-
  list("ga.m1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use"
       ,"ga.m2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use"
       ,"ga.m3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use"
       ,"ga.me1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use + education"
       ,"ga.me2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use + education"
       ,"ga.me3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use + education")
# here you are looping to run the models using multiple cores
out.list <-
  foreach(i=1:length(models.to.run)) %dopar% {
    models.to.run[[i]] %>%
    geeglm(formula=.,data=dt.model,id=cohort,family=gaussian,corstr="exch")  
  }
# Always remember to stopCluster - skipping this makes things very complicated
stopCluster(cl=cl)
out.list

英文:

In my attempts to do so however, I am facing an error that I believe is related to the gee model function I am using itself, geeglm() from the geepack package. The error states:

Incompatible types (from language to character) in subassignment fix

Based on this other post, it appears that the error is coming from the formula is text and would be fixed if I just typed the formula outside of "" as normal: https://stackoverflow.com/questions/60117482/geeglm-giving-error-incompatible-types-from-language-to-character-in-subassig

however I am trying to loop through a list of formulas. Other posts I've seen involve looping through a list of variables in a glm, not necessarily the entire formula.

Here is my code:

library(parallel)
library(doParallel)
library(tidyverse)
library(geepack)
dt.model &lt;- dt_long_cat
# Declare number of cluster to use
## Warning: more does NOT mean faster!!! As the objects and data sets need to be copied to each cluster!!
cl&lt;-makeCluster(10)
# All packages must be loaded in the parallel environment
clusterEvalQ(cl, {
   library(tidyverse)
   library(geepack)
 })
# export all objects to parallel environment - this is the slowest step
## if you have a large dataset they need to be exported to each cluster
clusterExport(cl
              , varlist=c(&quot;dt.model&quot;)
              , envir=environment())
models.to.run &lt;-
  list(&quot;ga.m1&quot;=&quot;log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use&quot;
       ,&quot;ga.m2&quot;=&quot;log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use&quot;
       ,&quot;ga.m3&quot;=&quot;log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use&quot;
       ,&quot;ga.me1&quot;=&quot;log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use + education&quot;
       ,&quot;ga.me2&quot;=&quot;log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use + education&quot;
       ,&quot;ga.me3&quot;=&quot;log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use + education&quot;)
# here you are looping to run the models using multiple cores
out.list &lt;-
  foreach(i=1:length(models.to.run)) %dopar% {
    models.to.run[[i]] %&gt;%
    geeglm(formula=.,data=dt.model,id=cohort,family=gaussian,corstr=&quot;exch&quot;)  
  }
# Always remember to stopCluster - skipping this makes things very complicated
stopCluster(cl=cl)
out.list

答案1

得分: 1

你可以完美地列出公式。

library(geepack)
data(dietox)  ## 从`geepack`包加载示例数据
## 制作公式列表
mf_lst <- list(fo1=Weight ~ Cu + Time, 
               fo2=Weight ~ Cu + Time + I(Time^2),
               fo3=Weight ~ Cu + Time + I(Time^2) + I(Time^3))
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c('dietox', 'mf_lst'))
clusterEvalQ(cl, library(geepack))
res <- parLapply(cl, mf_lst, \(x) geeglm(x, data=dietox, id=Pig, family=gaussian(), corstr="exch"))
stopCluster(cl)
#### Gives
lapply(res, \(x) coef(summary(x)))
# $fo1
#             Estimate Std.err     Wald Pr(>|W|)
# (Intercept)   15.422  1.0250  226.373    0.000
# CuCu035       -0.835  1.5643    0.285    0.593
# CuCu175        1.773  1.8766    0.893    0.345
# Time           6.943  0.0796 7604.531    0.000
# 
# $fo2
#             Estimate Std.err    Wald Pr(>|W|)
# (Intercept)   18.500  1.0238 326.525 0.00e+00
# CuCu035       -0.850  1.5633   0.296 5.87e-01
# CuCu175        1.766  1.8752   0.887 3.46e-01
# Time           5.624  0.1914 863.411 0.00e+00
# I(Time^2)      0.102  0.0131  60.010 9.44e-15
# 
# $fo3
#             Estimate Std.err   Wald Pr(>|W|)
# (Intercept)  21.1275 1.03471 416.93 0.00e+00
# CuCu035      -0.8425 1.56393   0.29 5.90e-01
# CuCu175       1.7698 1.87586   0.89 3.45e-01
# Time          3.5876 0.30194 141.18 0.00e+00
# I(Time^2)     0.4786 0.04854  97.24 0.00e+00
# I(Time^3)    -0.0194 0.00242  64.01 1.22e-15

注意：我已经为您翻译了代码的部分，现在只显示翻译后的内容，没有其他附加信息。

英文:

You can perfectly list formulae.

library(geepack)
data(dietox)  ## load example data from `geepack` package
## make list of formulae
mf_lst &lt;- list(fo1=Weight ~ Cu + Time, 
               fo2=Weight ~ Cu + Time + I(Time^2),
               fo3=Weight ~ Cu + Time + I(Time^2) + I(Time^3))
library(parallel)
cl &lt;- makeCluster(detectCores() - 1)
clusterExport(cl, c(&#39;dietox&#39;, &#39;mf_lst&#39;))
clusterEvalQ(cl, library(geepack))
res &lt;- parLapply(cl, mf_lst, \(x) geeglm(x, data=dietox, id=Pig, family=gaussian(), corstr=&quot;exch&quot;))
stopCluster(cl)

Gives

lapply(res, \(x) coef(summary(x)))
# $fo1
#             Estimate Std.err     Wald Pr(&gt;|W|)
# (Intercept)   15.422  1.0250  226.373    0.000
# CuCu035       -0.835  1.5643    0.285    0.593
# CuCu175        1.773  1.8766    0.893    0.345
# Time           6.943  0.0796 7604.531    0.000
# 
# $fo2
#             Estimate Std.err    Wald Pr(&gt;|W|)
# (Intercept)   18.500  1.0238 326.525 0.00e+00
# CuCu035       -0.850  1.5633   0.296 5.87e-01
# CuCu175        1.766  1.8752   0.887 3.46e-01
# Time           5.624  0.1914 863.411 0.00e+00
# I(Time^2)      0.102  0.0131  60.010 9.44e-15
# 
# $fo3
#             Estimate Std.err   Wald Pr(&gt;|W|)
# (Intercept)  21.1275 1.03471 416.93 0.00e+00
# CuCu035      -0.8425 1.56393   0.29 5.90e-01
# CuCu175       1.7698 1.87586   0.89 3.45e-01
# Time          3.5876 0.30194 141.18 0.00e+00
# I(Time^2)     0.4786 0.04854  97.24 0.00e+00
# I(Time^3)    -0.0194 0.00242  64.01 1.22e-15

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中循环遍历GEE模型中的公式列表？

问题

答案1

Gives

你可以使用R语言如何通过通用名称获取科学名称？

同步Shiny应用中两个rHandsontable输出之间的列顺序

过滤调查数据中的配偶，不包括受访者与配偶的群体。

What is the equivalent of .loc with multiple conditions in R?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。