英文:
How do I loop through a list of formulas in GEE model in R?
问题
I am trying to run a set of GEE models. However, each model takes about an hour to an hour and a half to run. This wouldnt normally be an issue, but given the amount of models i am trying to run, I am currently trying to implement parallel processing of these GEE models in hopes I can cut down on the overall run time.
In my attempts to do so however, I am facing an error that I believe is related to the gee model function I am using itself, geeglm()
from the geepack package. The error states:
Incompatible types (from language to character) in subassignment fix
Based on this other post, it appears that the error is coming from the formula is text and would be fixed if I just typed the formula outside of "" as normal:
however I am trying to loop through a list of formulas. Other posts I've seen involve looping through a list of variables in a glm, not necessarily the entire formula.
Here is my code:
library(parallel)
library(doParallel)
library(tidyverse)
library(geepack)
dt.model <- dt_long_cat
# Declare number of cluster to use
## Warning: more does NOT mean faster!!! As the objects and data sets need to be copied to each cluster!!
cl <- makeCluster(10)
# All packages must be loaded in the parallel environment
clusterEvalQ(cl, {
library(tidyverse)
library(geepack)
})
# export all objects to parallel environment - this is the slowest step
## if you have a large dataset they need to be exported to each cluster
clusterExport(cl
, varlist=c("dt.model")
, envir=environment())
models.to.run <-
list("ga.m1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use"
,"ga.m2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use"
,"ga.m3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use"
,"ga.me1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use + education"
,"ga.me2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use + education"
,"ga.me3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use + education")
# here you are looping to run the models using multiple cores
out.list <-
foreach(i=1:length(models.to.run)) %dopar% {
models.to.run[[i]] %>%
geeglm(formula=.,data=dt.model,id=cohort,family=gaussian,corstr="exch")
}
# Always remember to stopCluster - skipping this makes things very complicated
stopCluster(cl=cl)
out.list
英文:
I am trying to run a set of GEE models. However, each model takes about an hour to an hour and a half to run. This wouldnt normally be an issue, but given the amount of models i am trying to run, I am currently trying to implement parallel processing of these GEE models in hopes I can cut down on the overall run time.
In my attempts to do so however, I am facing an error that I believe is related to the gee model function I am using itself, geeglm()
from the geepack package. The error states:
Incompatible types (from language to character) in subassignment fix
Based on this other post, it appears that the error is coming from the formula is text and would be fixed if I just typed the formula outside of "" as normal: https://stackoverflow.com/questions/60117482/geeglm-giving-error-incompatible-types-from-language-to-character-in-subassig
however I am trying to loop through a list of formulas. Other posts I've seen involve looping through a list of variables in a glm, not necessarily the entire formula.
Here is my code:
library(parallel)
library(doParallel)
library(tidyverse)
library(geepack)
dt.model <- dt_long_cat
# Declare number of cluster to use
## Warning: more does NOT mean faster!!! As the objects and data sets need to be copied to each cluster!!
cl<-makeCluster(10)
# All packages must be loaded in the parallel environment
clusterEvalQ(cl, {
library(tidyverse)
library(geepack)
})
# export all objects to parallel environment - this is the slowest step
## if you have a large dataset they need to be exported to each cluster
clusterExport(cl
, varlist=c("dt.model")
, envir=environment())
models.to.run <-
list("ga.m1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use"
,"ga.m2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use"
,"ga.m3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use"
,"ga.me1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use + education"
,"ga.me2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use + education"
,"ga.me3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use + education")
# here you are looping to run the models using multiple cores
out.list <-
foreach(i=1:length(models.to.run)) %dopar% {
models.to.run[[i]] %>%
geeglm(formula=.,data=dt.model,id=cohort,family=gaussian,corstr="exch")
}
# Always remember to stopCluster - skipping this makes things very complicated
stopCluster(cl=cl)
out.list
答案1
得分: 1
你可以完美地列出公式。
library(geepack)
data(dietox) ## 从`geepack`包加载示例数据
## 制作公式列表
mf_lst <- list(fo1=Weight ~ Cu + Time,
fo2=Weight ~ Cu + Time + I(Time^2),
fo3=Weight ~ Cu + Time + I(Time^2) + I(Time^3))
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c('dietox', 'mf_lst'))
clusterEvalQ(cl, library(geepack))
res <- parLapply(cl, mf_lst, \(x) geeglm(x, data=dietox, id=Pig, family=gaussian(), corstr="exch"))
stopCluster(cl)
#### Gives
lapply(res, \(x) coef(summary(x)))
# $fo1
# Estimate Std.err Wald Pr(>|W|)
# (Intercept) 15.422 1.0250 226.373 0.000
# CuCu035 -0.835 1.5643 0.285 0.593
# CuCu175 1.773 1.8766 0.893 0.345
# Time 6.943 0.0796 7604.531 0.000
#
# $fo2
# Estimate Std.err Wald Pr(>|W|)
# (Intercept) 18.500 1.0238 326.525 0.00e+00
# CuCu035 -0.850 1.5633 0.296 5.87e-01
# CuCu175 1.766 1.8752 0.887 3.46e-01
# Time 5.624 0.1914 863.411 0.00e+00
# I(Time^2) 0.102 0.0131 60.010 9.44e-15
#
# $fo3
# Estimate Std.err Wald Pr(>|W|)
# (Intercept) 21.1275 1.03471 416.93 0.00e+00
# CuCu035 -0.8425 1.56393 0.29 5.90e-01
# CuCu175 1.7698 1.87586 0.89 3.45e-01
# Time 3.5876 0.30194 141.18 0.00e+00
# I(Time^2) 0.4786 0.04854 97.24 0.00e+00
# I(Time^3) -0.0194 0.00242 64.01 1.22e-15
注意:我已经为您翻译了代码的部分,现在只显示翻译后的内容,没有其他附加信息。
英文:
You can perfectly list
formulae.
library(geepack)
data(dietox) ## load example data from `geepack` package
## make list of formulae
mf_lst <- list(fo1=Weight ~ Cu + Time,
fo2=Weight ~ Cu + Time + I(Time^2),
fo3=Weight ~ Cu + Time + I(Time^2) + I(Time^3))
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c('dietox', 'mf_lst'))
clusterEvalQ(cl, library(geepack))
res <- parLapply(cl, mf_lst, \(x) geeglm(x, data=dietox, id=Pig, family=gaussian(), corstr="exch"))
stopCluster(cl)
Gives
lapply(res, \(x) coef(summary(x)))
# $fo1
# Estimate Std.err Wald Pr(>|W|)
# (Intercept) 15.422 1.0250 226.373 0.000
# CuCu035 -0.835 1.5643 0.285 0.593
# CuCu175 1.773 1.8766 0.893 0.345
# Time 6.943 0.0796 7604.531 0.000
#
# $fo2
# Estimate Std.err Wald Pr(>|W|)
# (Intercept) 18.500 1.0238 326.525 0.00e+00
# CuCu035 -0.850 1.5633 0.296 5.87e-01
# CuCu175 1.766 1.8752 0.887 3.46e-01
# Time 5.624 0.1914 863.411 0.00e+00
# I(Time^2) 0.102 0.0131 60.010 9.44e-15
#
# $fo3
# Estimate Std.err Wald Pr(>|W|)
# (Intercept) 21.1275 1.03471 416.93 0.00e+00
# CuCu035 -0.8425 1.56393 0.29 5.90e-01
# CuCu175 1.7698 1.87586 0.89 3.45e-01
# Time 3.5876 0.30194 141.18 0.00e+00
# I(Time^2) 0.4786 0.04854 97.24 0.00e+00
# I(Time^3) -0.0194 0.00242 64.01 1.22e-15
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论