高效迭代地在R中拟合、诊断、修改和组织线性模型(汇总到一个地方)。

huangapple go评论111阅读模式
英文:

Efficiently and iteratively fit, diagnose, modify, and organize linear models in R (into one place)

问题

以下是您要翻译的内容:

"oftentimes I need to fit several linear models (dozens) in R. These models may have the same predictor but varying response variables, data distribution families, and more.

What I've been doing: fitting models one by one, looking at diagnostic plots one by one, and altering the model as needed (see below).

My goal: fit models with functions that fit initial models using a specified set of predictors and response variables and families and datasets. Put those models into one place, like a dataframe, where I can access the model object itself as well as characteristics like the model's AIC or formula. Provide diagnostic plots with DHARMa. If necessary, modify the model and then update the dataframe to reflect that model's changes.

How can I achieve this ideal workflow? I've tried to make a reprex, below. Thanks for any advice!

  1. library(tidyverse)
  2. library(DHARMa)
  3. library(magrittr)
  4. library(glmmTMB)
  5. cars <- datasets::mtcars
  6. # 3 models, each with different response variables. Some (e.g. glm2)
  7. # have non-Gaussian families; some have filtered datasets
  8. lm1 <- glmmTMB(mpg ~ hp, data = cars)
  9. glm2 <- glmmTMB(cyl ~ hp,
  10. data = cars,
  11. family = "poisson")
  12. lm3 <- glmmTMB(drat ~ hp,
  13. data = cars %>%
  14. dplyr::filter(carb != 1))
  15. # next, look at diagnostic plots
  16. plot(DHARMa::simulateResiduals(lm1))
  17. plot(DHARMa::simulateResiduals(glm2))
  18. plot(DHARMa::simulateResiduals(lm3))
  19. # transform one model's response variable to satisfy assumptions
  20. glm2 <- glmmTMB(cyl^2 ~ hp,
  21. data = cars,
  22. family = "poisson")
  23. # check diagnostics again
  24. plot(DHARMa::simulateResiduals(lm2))
  25. # organize models into one place. A dataframe? A list? Somewhere where
  26. # I can access the model itself and the model formula so I can extract
  27. # those things to be used as arguments in other functions
  28. ?
英文:

oftentimes I need to fit several linear models (dozens) in R. These models may have the same predictor but varying response variables, data distribution families, and more.

What I've been doing: fitting models one by one, looking at diagnostic plots one by one, and altering the model as needed (see below).

My goal: fit models with functions that fit initial models using a specified set of predictors and response variables and families and datasets. Put those models into one place, like a dataframe, where I can access the model object itself as well as characteristics like the model's AIC or formula. Provide diagnostic plots with DHARMa. If necessary, modify the model and then update the dataframe to reflect that model's changes.

How can I achieve this ideal workflow? I've tried to make a reprex, below. Thanks for any advice!

  1. library(tidyverse)
  2. library(DHARMa)
  3. library(magrittr)
  4. library(glmmTMB)
  5. cars &lt;- datasets::mtcars
  6. # 3 models, each with different response variables. Some (e.g. glm2)
  7. # have non-Gaussian families; some have filtered datasets
  8. lm1 &lt;- glmmTMB(mpg ~ hp, data = cars)
  9. glm2 &lt;- glmmTMB(cyl ~ hp,
  10. data = cars,
  11. family = &quot;poisson&quot;)
  12. lm3 &lt;- glmmTMB(drat ~ hp,
  13. data = cars %&gt;%
  14. dplyr::filter(carb != 1))
  15. # next, look at diagnostic plots
  16. plot(DHARMa::simulateResiduals(lm1))
  17. plot(DHARMa::simulateResiduals(glm2))
  18. plot(DHARMa::simulateResiduals(lm3))
  19. # transform one model&#39;s response variable to satisfy assumptions
  20. glm2 &lt;- glmmTMB(cyl^2 ~ hp,
  21. data = cars,
  22. family = &quot;poisson&quot;)
  23. # check diagnostics again
  24. plot(DHARMa::simulateResiduals(lm2))
  25. # organize models into one place. A dataframe? A list? Somewhere where
  26. # I can access the model itself and the model formula so I can extract
  27. # those things to be used as arguments in other functions
  28. ?

答案1

得分: 1

你可以将"everything"放入数据框中,只要接收列是列表列(意味着该列的单元格内容是列表)。这是一种方便的方法,可以对数据进行切片和切块,并将它们和/或模型公式、配方、图表等保留在表格结构中。只需记住在必要时将项目封装到列表中。

对于你的示例,你可以:

  • 创建所有所需模型函数、依赖变量、预测变量等组合的数据框:
  1. the_grid <-
  2. expand.grid(dependent = c('mpg', 'cyl', 'drat'),
  3. type = 'glmmTMB',
  4. predictors = c('hp'),
  5. predicate = c('TRUE', 'TRUE', 'carb != 1'),
  6. stringsAsFactors = FALSE
  7. )
  • 使用此数据框添加包含模型和每个参数组合的绘图的列表列:
  1. library(DHARMa)
  2. library(glmmTMB)
  3. library(dplyr)
  4. library(ggeffects) ## to plot model effects
  5. the_models <-
  6. the_grid %>%
  7. rowwise() %>%
  8. mutate(
  9. m = do.call(type,
  10. args = list(formula = reformulate(response = dependent,
  11. termlabels = predictors,
  12. ),
  13. data = mtcars
  14. )
  15. ) |> list(),
  16. p = list(ggpredict(m) |> plot())
  17. )
  • 示例:第一行的模型摘要:
  1. the_models$m[[1]] |> summary()
  • 示例:第5行的效应图:
  1. the_models$p[[5]]
英文:

You can put "everything" into a dataframe, as long as the receiving column is a list-column (meaning cell contents of this columns are lists). That's a convenient way to slice & dice your data and keep them and/or model formulae, recipes, plots, you name it ... in a tabular structure. Just remember to encapsulate items in a list, where necessary.

In case of your example, you could:

  • create a dataframe of all desired combinations of model function, dependent, predictors etc.:
  1. the_grid &lt;-
  2. expand.grid(dependent = c(&#39;mpg&#39;, &#39;cyl&#39;, &#39;drat&#39;),
  3. type = &#39;glmmTMB&#39;,
  4. predictors = c(&#39;hp&#39;),
  5. predicate = c(&#39;TRUE&#39;, &#39;TRUE&#39;, &#39;carb != 1&#39;),
  6. stringsAsFactors = FALSE
  7. )
  • use this dataframe to add list-columns containing, e.g. a model and a plot per parameter combination:
  1. library(DHARMa)
  2. library(glmmTMB)
  3. library(dplyr)
  4. library(ggeffects) ## to plot model effects
  5. the_models &lt;-
  6. the_grid |&gt;
  7. rowwise() |&gt; ## !important
  8. mutate(
  9. m = do.call(type,
  10. args = list(formula = reformulate(response = dependent,
  11. termlabels = predictors,
  12. ),
  13. data = mtcars
  14. )
  15. ) |&gt; list(), ## don&#39;t forget to wrap the result in a list
  16. ## add a ggeffect plot object for fun:
  17. p = list(ggpredict(m) |&gt; plot()) ## again, use a list
  18. )
  1. ## &gt; the_models
  2. ## # A tibble: 9 x 6
  3. ## # Rowwise:
  4. ## dependent type predictors predicate m p
  5. ## &lt;fct&gt; &lt;fct&gt; &lt;fct&gt; &lt;fct&gt; &lt;list&gt; &lt;list&gt;
  6. ## 1 mpg glmmTMB hp TRUE &lt;glmmTMB&gt; &lt;named list [1]&gt;
  7. ## 2 cyl glmmTMB hp TRUE &lt;glmmTMB&gt; &lt;named list [1]&gt;
  8. ## 3 drat glmmTMB hp TRUE &lt;glmmTMB&gt; &lt;named list [1]&gt;
  9. ## 4 mpg glmmTMB hp TRUE &lt;glmmTMB&gt; &lt;named list [1]&gt;
  10. ## 5 cyl glmmTMB hp TRUE &lt;glmmTMB&gt; &lt;named list [1]&gt;
  11. ## 6 drat glmmTMB hp TRUE &lt;glmmTMB&gt; &lt;named list [1]&gt;
  12. ## 7 mpg glmmTMB hp carb != 1 &lt;glmmTMB&gt; &lt;named list [1]&gt;
  13. ## 8 cyl glmmTMB hp carb != 1 &lt;glmmTMB&gt; &lt;named list [1]&gt;
  • example: model summary for row 1:
  1. ## &gt; the_models$m[[1]] |&gt; summary()
  2. ## Family: gaussian ( identity )
  3. ## Formula: mpg ~ hp
  4. ## Data: filter(mtcars, TRUE)
  5. ##
  6. ## AIC BIC logLik deviance df.resid
  7. ## 181.2 185.6 -87.6 175.2 29
  8. ##
  9. ##
  10. ## Dispersion estimate for gaussian family (sigma^2): 14
  11. ##
  12. ## Conditional model:
  13. ## Estimate Std. Error z value Pr(&gt;|z|)
  14. ## (Intercept) 30.098862 1.582037 19.025 &lt; 2e-16 ***
  15. ## hp -0.068228 0.009798 -6.964 3.32e-12 ***
  16. ## ---
  17. ## Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
  • example: effect plot for row 5:
  1. the_models$p[[5]]

高效迭代地在R中拟合、诊断、修改和组织线性模型(汇总到一个地方)。

huangapple
  • 本文由 发表于 2023年6月8日 03:07:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76426384.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定