从多个线性回归模型的输出创建数据框。

huangapple go评论88阅读模式
英文:

Creating a dataframe from output of several linear regression models

问题

I understand that you want the code parts to be left untranslated. Here's the translation for the non-code sections:


我正在运行5个简单的线性回归,然后进行1个包含了所有5个预测变量的多元线性回归。

我可以生成一个包含了来自这5个简单回归模型的所有beta系数的数据框,以及一个包含了多元回归模型调整后的beta系数的第二个数据框。我希望以最有效的方式将这些数据框合并在一起。我希望最终的产物看起来像这样:

  1. 系数 (简) 估计 (简) 标准误差 (多)调整估计 (多) 调整标准误差
  2. FEV1 74.1 14.1 31.255 27.041
  3. AGE -3.10 1.33 -3.236 1.257
  4. 等等。

以下是变量。对于所有的模型,MWT1Best 是结果变量:

  1. str(copd2)
  2. $ 年龄 : 整数 77 79 80 56 65 67 67 83 72 75 ...
  3. $ COPD严重程度: 字符串 "轻度" "中度" "中度" "非常严重" ...
  4. $ MWT1Best : 整数 120 176 201 210 210 216 237 237 237 240 ...
  5. $ FEV1 : 数值 1.21 1.09 1.52 0.47 1.07 1.09 0.69 0.68 2.13 1.06 ...
  6. $ 性别 : 整数 1 0 0 1 1 0 0 1 1 0 ...
  7. $ 共病2 : 整数 1 1 1 1 1 1 1 1 1 1 ...

5个简单线性回归模型的代码:

  1. copd2$COPDSEVERITY <- recode(copd2$COPDSEVERITY, "轻度" = 0, "中度" = 1, "严重" = 2, "非常严重" = 3)
  2. f.MWT <- melt(data.frame(x = copd2$MWT1Best,
  3. FEV1=copd2$FEV1,
  4. 年龄=copd2$AGE,
  5. 性别=copd2$gender,
  6. 严重程度=copd2$COPDSEVERITY,
  7. 共病=copd2$共病2),
  8. id.vars = "x")
  9. MWT.simp <- mergedf.MWT %>% group_by(变量) %>% do(tidy(lm(x ~ value, data = .)))

简单线性回归的输出:

  1. 变量 term 估计 标准误差 统计量 p
  2. 1 FEV1 (截距) 280. 24.6 11.4 1.15e-19
  3. 2 FEV1 value 74.1 14.1 5.26 8.47e- 7
  4. 3 年龄 (截距) 616. 93.4 6.60 2.14e- 9
  5. 4 年龄 value -3.10 1.33 -2.34 2.13e- 2
  6. 5 性别 (截距) 380. 17.7 21.5 7.77e-39
  7. 6 性别 value 30.5 22.1 1.38 1.70e- 1
  8. 7 严重程度 (截距) 459. 16.4 28.0 1.60e-48
  9. 8 严重程度 value -50.1 11.0 -4.55 1.54e- 5
  10. 9 共病 (截距) 423. 15.6 27.0 3.13e-47
  11. 10 共病 value -43.0 21.1 -2.04 4.43e- 2

使用 MWT.mult <- tidy(model) 的多元回归输出:

  1. term 估计 标准误差 统计量 p
  2. 1 (截距) 615. 116. 5.31 0.000000766
  3. 2 FEV1 31.3 27.0 1.16 0.251
  4. 3 年龄 -3.24 1.26 -2.57 0.0117
  5. 4 copd$gender1 29.3 24.2 1.21 0.228
  6. 5 COPD严重程度中度 -25.9 29.0 -0.894 0.374
  7. 6 COPD严重程度严重 -42.7 42.4 -1.01 0.317
  8. 7 COPD严重程度非常严重 -135. 60.6 -2.22 0.0289
  9. 8 共病1 -45.3 18.6 -2.44 0.0167

问题1:我通过将 COPD 严重程度 编码为简单数据框中的整数,丧失了一些beta系数。是否有一种方式可以让简单模型的所有3个beta系数显示在我使用的代码创建的简单数据框中?我想象中的替代方式可能是单独运行简单回归,然后手动合并产生的输出。

问题2:是否有一个包可以创建结合了简单和多元线性回归输出的工具?为了合并这些数据框,我做了以下操作:

  1. MWT.simp <- filter(MWT.simp, term=="
  2. <details>
  3. <summary>英文:</summary>
  4. I am running 5 simple linear regressions, then 1 multiple linear regression with all 5 predictors.
  5. I can produce a dataframe with all betas from the 5 simple regression models, and a second dataframe with the adjusted betas from the multiple regression model. I would like to combine these dataframes in the most efficient way possible. I want the final product to look something like this:

Coefficient (Simp) Est. (Simp) Std. Error (Mult)Adj. Est. (Mult) Adj. Std. Error
FEV1 74.1 14.1 31.255 27.041.
AGE -3.10 1.33 -3.236 1.257
etc.

  1. Here are the variables. For all models, MWT1Best is the outcome variable:

str(copd2)
$ AGE : int 77 79 80 56 65 67 67 83 72 75 ...
$ COPDSEVERITY: chr "SEVERE" "MODERATE" "MODERATE" "VERY SEVERE" ...
$ MWT1Best : int 120 176 201 210 210 216 237 237 237 240 ...
$ FEV1 : num 1.21 1.09 1.52 0.47 1.07 1.09 0.69 0.68 2.13 1.06 ...
$ gender : int 1 0 0 1 1 0 0 1 1 0 ...
$ comorbid2 : int 1 1 1 1 1 1 1 1 1 1 ...

  1. 5 Simple linear regression models code:

copd2$COPDSEVERITY <- recode(copd2$COPDSEVERITY, "MILD" = 0, "MODERATE" = 1, "SEVERE" = 2, "VERY SEVERE" = 3)

f.MWT <- melt(data.frame(x = copd2$MWT1Best,
FEV1=copd2$FEV1,
AGE=copd2$AGE,
Gender=copd2$gender
Severity=copd2$COPDSEVERITY,
Comorbid=copd2$comorbid2),
id.vars = "x")

MWT.simp - mergedf.MWT %>% group_by(variable) %>% do(tidy(lm(x ~ value, data = .)))

  1. Simple linear regression output:

variable term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 FEV1 (Intercept) 280. 24.6 11.4 1.15e-19
2 FEV1 value 74.1 14.1 5.26 8.47e- 7
3 AGE (Intercept) 616. 93.4 6.60 2.14e- 9
4 AGE value -3.10 1.33 -2.34 2.13e- 2
5 Gender (Intercept) 380. 17.7 21.5 7.77e-39
6 Gender value 30.5 22.1 1.38 1.70e- 1
7 Severity (Intercept) 459. 16.4 28.0 1.60e-48
8 Severity value -50.1 11.0 -4.55 1.54e- 5
9 Comorbid (Intercept) 423. 15.6 27.0 3.13e-47
10 Comorbid value -43.0 21.1 -2.04 4.43e- 2

  1. Multiple regression output using MWT.mult &lt;- tidy(model):

term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 615. 116. 5.31 0.000000766
2 FEV1 31.3 27.0 1.16 0.251
3 AGE -3.24 1.26 -2.57 0.0117
4 copd$gender1 29.3 24.2 1.21 0.228
5 COPDSEVERITYMODERATE -25.9 29.0 -0.894 0.374
6 COPDSEVERITYSEVERE -42.7 42.4 -1.01 0.317
7 COPDSEVERITYVERY SEVERE -135. 60.6 -2.22 0.0289
8 comorbid1 -45.3 18.6 -2.44 0.0167

  1. Problem 1: I lost some betas by coding COPDSEVERITY as an integer in the simple dataframe. Is there a way to have all 3 betas from the simple model show up on the simple dataframe that I created with the code that I used? I imagine the alternative would be to run the simple regression separately and manually merge the resulting output.
  2. Output for lm(MWB1.Best ~ COPDSEVERITY)
  1. (Intercept) COPDSEVERITYMODERATE COPDSEVERITYSEVERE
  2. 458.08696 -51.08696 -89.42029

COPDSEVERITYVERY SEVERE
-167.21196

  1. Problem 2: Is there a package that creates combined simple and multiple linear regression outputs? To combine these dataframes I did the following:

MWT.simp <- filter(MWT.simp, term=="value") #remove all intercepts
MWT.simp <- MWT.simp %>% select(variable, estimate, std.error) #select appropriate columns
MWT.mult <- MWT.mult %>% select(term, estimate, std.error) #select appropriate columns
MWT.mult <- MWT.mult %>% rename("variable" = "term") #rename to prepare for merge
MWT.compare <- merge(x = MWT.simp, y = MWT.mult, by = "variable", all.x = TRUE)

  1. Output:

variable estimate.x std.error.x estimate.y std.error.y
1 FEV1 74.110667 14.089604 31.254582 27.041458
2 AGE -3.104007 1.326155 -3.235664 1.257062
3 Gender 30.510417 22.097009 NA NA
4 Severity -50.130769 11.017792 NA NA
5 Comorbid -42.951515 21.084591 NA NA

  1. Upon viewing my output I realize that the variables Gender and Comorbid also need to be renamed across the two datasets, and that I didn&#39;t address the COPDSEVERITY issue. Before I go further I thought there must be a better way of doing this, as this is such a common way of presenting data in journals.
  2. Thanks!
  3. </details>
  4. # 答案1
  5. **得分**: 2
  6. I try to reduce your question and address to your core problems. Do 5 models and compare them. There are good packages that do the job.
  7. # 5 linear regression models
  8. ols1 <- lm(mpg ~ vs, data=mtcars)
  9. ols2 <- lm(mpg ~ drat, data=mtcars)
  10. ols3 <- lm(mpg ~ cyl, data=mtcars)
  11. ols4 <- lm(mpg ~ disp, data=mtcars)
  12. ols5 <- lm(mpg ~ vs + drat + cyl + disp, data=mtcars)
  13. # model comparison
  14. library(modelsummary)
  15. modelsummary(list("simple 1" = ols1,
  16. "simple 2" = ols2,
  17. "simple 3" = ols3,
  18. "simple 4" = ols4,
  19. "multiple" = ols5))
  20. <details>
  21. <summary>英文:</summary>
  22. I try to reduce your question and adress to your core problems. Do 5 models and compare them. There are good packages that do the job.
  23. # 5 linear regression models
  24. ols1 &lt;- lm(mpg ~ vs, data=mtcars)
  25. ols2 &lt;- lm(mpg ~ drat, data=mtcars)
  26. ols3 &lt;- lm(mpg ~ cyl, data=mtcars)
  27. ols4 &lt;- lm(mpg ~ disp, data=mtcars)
  28. ols5 &lt;- lm(mpg ~ vs + drat + cyl + disp, data=mtcars)
  29. # model comparison
  30. library(modelsummary)
  31. modelsummary(list(&quot;simple 1&quot; = ols1,
  32. &quot;simple 2&quot; = ols2,
  33. &quot;simple 3&quot; = ols3,
  34. &quot;simple 4&quot; = ols4,
  35. &quot;multiple&quot; = ols5))
  36. [![enter image description here][1]][1]
  37. [1]: https://i.stack.imgur.com/e5kD4.png
  38. </details>

huangapple
  • 本文由 发表于 2023年4月17日 04:41:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76030231.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定