如何在R中循环遍历GEE模型中的公式列表?

huangapple go评论81阅读模式
英文:

How do I loop through a list of formulas in GEE model in R?

问题

I am trying to run a set of GEE models. However, each model takes about an hour to an hour and a half to run. This wouldnt normally be an issue, but given the amount of models i am trying to run, I am currently trying to implement parallel processing of these GEE models in hopes I can cut down on the overall run time.

In my attempts to do so however, I am facing an error that I believe is related to the gee model function I am using itself, geeglm() from the geepack package. The error states:

Incompatible types (from language to character) in subassignment fix

Based on this other post, it appears that the error is coming from the formula is text and would be fixed if I just typed the formula outside of "" as normal:

however I am trying to loop through a list of formulas. Other posts I've seen involve looping through a list of variables in a glm, not necessarily the entire formula.

Here is my code:

library(parallel)
library(doParallel)
library(tidyverse)
library(geepack)
dt.model <- dt_long_cat

# Declare number of cluster to use
## Warning: more does NOT mean faster!!! As the objects and data sets need to be copied to each cluster!!
cl <- makeCluster(10)

# All packages must be loaded in the parallel environment
clusterEvalQ(cl, {
   library(tidyverse)
   library(geepack)
 })

# export all objects to parallel environment - this is the slowest step
## if you have a large dataset they need to be exported to each cluster
clusterExport(cl
              , varlist=c("dt.model")
              , envir=environment())


models.to.run <-
  list("ga.m1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use"
       ,"ga.m2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use"
       ,"ga.m3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use"
       ,"ga.me1"="log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use + education"
       ,"ga.me2"="log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use + education"
       ,"ga.me3"="log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use + education")

# here you are looping to run the models using multiple cores
out.list <-
  foreach(i=1:length(models.to.run)) %dopar% {
    models.to.run[[i]] %>%
    geeglm(formula=.,data=dt.model,id=cohort,family=gaussian,corstr="exch")  
  }


# Always remember to stopCluster - skipping this makes things very complicated
stopCluster(cl=cl)

out.list

英文:

I am trying to run a set of GEE models. However, each model takes about an hour to an hour and a half to run. This wouldnt normally be an issue, but given the amount of models i am trying to run, I am currently trying to implement parallel processing of these GEE models in hopes I can cut down on the overall run time.

In my attempts to do so however, I am facing an error that I believe is related to the gee model function I am using itself, geeglm() from the geepack package. The error states:

Incompatible types (from language to character) in subassignment fix

Based on this other post, it appears that the error is coming from the formula is text and would be fixed if I just typed the formula outside of "" as normal: https://stackoverflow.com/questions/60117482/geeglm-giving-error-incompatible-types-from-language-to-character-in-subassig

however I am trying to loop through a list of formulas. Other posts I've seen involve looping through a list of variables in a glm, not necessarily the entire formula.

Here is my code:

library(parallel)
library(doParallel)
library(tidyverse)
library(geepack)
dt.model &lt;- dt_long_cat

# Declare number of cluster to use
## Warning: more does NOT mean faster!!! As the objects and data sets need to be copied to each cluster!!
cl&lt;-makeCluster(10)

# All packages must be loaded in the parallel environment
clusterEvalQ(cl, {
   library(tidyverse)
   library(geepack)
 })

# export all objects to parallel environment - this is the slowest step
## if you have a large dataset they need to be exported to each cluster
clusterExport(cl
              , varlist=c(&quot;dt.model&quot;)
              , envir=environment())


models.to.run &lt;-
  list(&quot;ga.m1&quot;=&quot;log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use&quot;
       ,&quot;ga.m2&quot;=&quot;log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use&quot;
       ,&quot;ga.m3&quot;=&quot;log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use&quot;
       ,&quot;ga.me1&quot;=&quot;log(ga) ~ exposure1 + mat_age+ par + season + bmi_cat + tobacco_use + education&quot;
       ,&quot;ga.me2&quot;=&quot;log(ga) ~ exposure2 + mat_age+ par + season + bmi_cat + tobacco_use + education&quot;
       ,&quot;ga.me3&quot;=&quot;log(ga) ~ exposure3 + mat_age+ par + season + bmi_cat + tobacco_use + education&quot;)

# here you are looping to run the models using multiple cores
out.list &lt;-
  foreach(i=1:length(models.to.run)) %dopar% {
    models.to.run[[i]] %&gt;%
    geeglm(formula=.,data=dt.model,id=cohort,family=gaussian,corstr=&quot;exch&quot;)  
  }


# Always remember to stopCluster - skipping this makes things very complicated
stopCluster(cl=cl)

out.list

答案1

得分: 1

你可以完美地列出公式。

library(geepack)

data(dietox)  ## 从`geepack`包加载示例数据

## 制作公式列表
mf_lst <- list(fo1=Weight ~ Cu + Time, 
               fo2=Weight ~ Cu + Time + I(Time^2),
               fo3=Weight ~ Cu + Time + I(Time^2) + I(Time^3))


library(parallel)

cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c('dietox', 'mf_lst'))
clusterEvalQ(cl, library(geepack))

res <- parLapply(cl, mf_lst, \(x) geeglm(x, data=dietox, id=Pig, family=gaussian(), corstr="exch"))

stopCluster(cl)

#### Gives

lapply(res, \(x) coef(summary(x)))
# $fo1
#             Estimate Std.err     Wald Pr(>|W|)
# (Intercept)   15.422  1.0250  226.373    0.000
# CuCu035       -0.835  1.5643    0.285    0.593
# CuCu175        1.773  1.8766    0.893    0.345
# Time           6.943  0.0796 7604.531    0.000
# 
# $fo2
#             Estimate Std.err    Wald Pr(>|W|)
# (Intercept)   18.500  1.0238 326.525 0.00e+00
# CuCu035       -0.850  1.5633   0.296 5.87e-01
# CuCu175        1.766  1.8752   0.887 3.46e-01
# Time           5.624  0.1914 863.411 0.00e+00
# I(Time^2)      0.102  0.0131  60.010 9.44e-15
# 
# $fo3
#             Estimate Std.err   Wald Pr(>|W|)
# (Intercept)  21.1275 1.03471 416.93 0.00e+00
# CuCu035      -0.8425 1.56393   0.29 5.90e-01
# CuCu175       1.7698 1.87586   0.89 3.45e-01
# Time          3.5876 0.30194 141.18 0.00e+00
# I(Time^2)     0.4786 0.04854  97.24 0.00e+00
# I(Time^3)    -0.0194 0.00242  64.01 1.22e-15

注意:我已经为您翻译了代码的部分,现在只显示翻译后的内容,没有其他附加信息。

英文:

You can perfectly list formulae.

library(geepack)

data(dietox)  ## load example data from `geepack` package

## make list of formulae
mf_lst &lt;- list(fo1=Weight ~ Cu + Time, 
               fo2=Weight ~ Cu + Time + I(Time^2),
               fo3=Weight ~ Cu + Time + I(Time^2) + I(Time^3))


library(parallel)

cl &lt;- makeCluster(detectCores() - 1)
clusterExport(cl, c(&#39;dietox&#39;, &#39;mf_lst&#39;))
clusterEvalQ(cl, library(geepack))

res &lt;- parLapply(cl, mf_lst, \(x) geeglm(x, data=dietox, id=Pig, family=gaussian(), corstr=&quot;exch&quot;))

stopCluster(cl)

Gives

lapply(res, \(x) coef(summary(x)))
# $fo1
#             Estimate Std.err     Wald Pr(&gt;|W|)
# (Intercept)   15.422  1.0250  226.373    0.000
# CuCu035       -0.835  1.5643    0.285    0.593
# CuCu175        1.773  1.8766    0.893    0.345
# Time           6.943  0.0796 7604.531    0.000
# 
# $fo2
#             Estimate Std.err    Wald Pr(&gt;|W|)
# (Intercept)   18.500  1.0238 326.525 0.00e+00
# CuCu035       -0.850  1.5633   0.296 5.87e-01
# CuCu175        1.766  1.8752   0.887 3.46e-01
# Time           5.624  0.1914 863.411 0.00e+00
# I(Time^2)      0.102  0.0131  60.010 9.44e-15
# 
# $fo3
#             Estimate Std.err   Wald Pr(&gt;|W|)
# (Intercept)  21.1275 1.03471 416.93 0.00e+00
# CuCu035      -0.8425 1.56393   0.29 5.90e-01
# CuCu175       1.7698 1.87586   0.89 3.45e-01
# Time          3.5876 0.30194 141.18 0.00e+00
# I(Time^2)     0.4786 0.04854  97.24 0.00e+00
# I(Time^3)    -0.0194 0.00242  64.01 1.22e-15

huangapple
  • 本文由 发表于 2023年6月6日 00:26:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76408338.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定