英文:
Creating a dataframe from output of several linear regression models
问题
I understand that you want the code parts to be left untranslated. Here's the translation for the non-code sections:
我正在运行5个简单的线性回归,然后进行1个包含了所有5个预测变量的多元线性回归。
我可以生成一个包含了来自这5个简单回归模型的所有beta系数的数据框,以及一个包含了多元回归模型调整后的beta系数的第二个数据框。我希望以最有效的方式将这些数据框合并在一起。我希望最终的产物看起来像这样:
系数 (简) 估计 (简) 标准误差 (多)调整估计 (多) 调整标准误差
FEV1 74.1 14.1 31.255 27.041
AGE -3.10 1.33 -3.236 1.257
等等。
以下是变量。对于所有的模型,MWT1Best 是结果变量:
str(copd2)
$ 年龄 : 整数 77 79 80 56 65 67 67 83 72 75 ...
$ COPD严重程度: 字符串 "轻度" "中度" "中度" "非常严重" ...
$ MWT1Best : 整数 120 176 201 210 210 216 237 237 237 240 ...
$ FEV1 : 数值 1.21 1.09 1.52 0.47 1.07 1.09 0.69 0.68 2.13 1.06 ...
$ 性别 : 整数 1 0 0 1 1 0 0 1 1 0 ...
$ 共病2 : 整数 1 1 1 1 1 1 1 1 1 1 ...
5个简单线性回归模型的代码:
copd2$COPDSEVERITY <- recode(copd2$COPDSEVERITY, "轻度" = 0, "中度" = 1, "严重" = 2, "非常严重" = 3)
f.MWT <- melt(data.frame(x = copd2$MWT1Best,
FEV1=copd2$FEV1,
年龄=copd2$AGE,
性别=copd2$gender,
严重程度=copd2$COPDSEVERITY,
共病=copd2$共病2),
id.vars = "x")
MWT.simp <- mergedf.MWT %>% group_by(变量) %>% do(tidy(lm(x ~ value, data = .)))
简单线性回归的输出:
变量 term 估计 标准误差 统计量 p值
1 FEV1 (截距) 280. 24.6 11.4 1.15e-19
2 FEV1 value 74.1 14.1 5.26 8.47e- 7
3 年龄 (截距) 616. 93.4 6.60 2.14e- 9
4 年龄 value -3.10 1.33 -2.34 2.13e- 2
5 性别 (截距) 380. 17.7 21.5 7.77e-39
6 性别 value 30.5 22.1 1.38 1.70e- 1
7 严重程度 (截距) 459. 16.4 28.0 1.60e-48
8 严重程度 value -50.1 11.0 -4.55 1.54e- 5
9 共病 (截距) 423. 15.6 27.0 3.13e-47
10 共病 value -43.0 21.1 -2.04 4.43e- 2
使用 MWT.mult <- tidy(model) 的多元回归输出:
term 估计 标准误差 统计量 p值
1 (截距) 615. 116. 5.31 0.000000766
2 FEV1 31.3 27.0 1.16 0.251
3 年龄 -3.24 1.26 -2.57 0.0117
4 copd$gender1 29.3 24.2 1.21 0.228
5 COPD严重程度中度 -25.9 29.0 -0.894 0.374
6 COPD严重程度严重 -42.7 42.4 -1.01 0.317
7 COPD严重程度非常严重 -135. 60.6 -2.22 0.0289
8 共病1 -45.3 18.6 -2.44 0.0167
问题1:我通过将 COPD 严重程度 编码为简单数据框中的整数,丧失了一些beta系数。是否有一种方式可以让简单模型的所有3个beta系数显示在我使用的代码创建的简单数据框中?我想象中的替代方式可能是单独运行简单回归,然后手动合并产生的输出。
问题2:是否有一个包可以创建结合了简单和多元线性回归输出的工具?为了合并这些数据框,我做了以下操作:
MWT.simp <- filter(MWT.simp, term=="
<details>
<summary>英文:</summary>
I am running 5 simple linear regressions, then 1 multiple linear regression with all 5 predictors.
I can produce a dataframe with all betas from the 5 simple regression models, and a second dataframe with the adjusted betas from the multiple regression model. I would like to combine these dataframes in the most efficient way possible. I want the final product to look something like this:
Coefficient (Simp) Est. (Simp) Std. Error (Mult)Adj. Est. (Mult) Adj. Std. Error
FEV1 74.1 14.1 31.255 27.041.
AGE -3.10 1.33 -3.236 1.257
etc.
Here are the variables. For all models, MWT1Best is the outcome variable:
str(copd2)
$ AGE : int 77 79 80 56 65 67 67 83 72 75 ...
$ COPDSEVERITY: chr "SEVERE" "MODERATE" "MODERATE" "VERY SEVERE" ...
$ MWT1Best : int 120 176 201 210 210 216 237 237 237 240 ...
$ FEV1 : num 1.21 1.09 1.52 0.47 1.07 1.09 0.69 0.68 2.13 1.06 ...
$ gender : int 1 0 0 1 1 0 0 1 1 0 ...
$ comorbid2 : int 1 1 1 1 1 1 1 1 1 1 ...
5 Simple linear regression models code:
copd2$COPDSEVERITY <- recode(copd2$COPDSEVERITY, "MILD" = 0, "MODERATE" = 1, "SEVERE" = 2, "VERY SEVERE" = 3)
f.MWT <- melt(data.frame(x = copd2$MWT1Best,
FEV1=copd2$FEV1,
AGE=copd2$AGE,
Gender=copd2$gender
Severity=copd2$COPDSEVERITY,
Comorbid=copd2$comorbid2),
id.vars = "x")
MWT.simp - mergedf.MWT %>% group_by(variable) %>% do(tidy(lm(x ~ value, data = .)))
Simple linear regression output:
variable term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 FEV1 (Intercept) 280. 24.6 11.4 1.15e-19
2 FEV1 value 74.1 14.1 5.26 8.47e- 7
3 AGE (Intercept) 616. 93.4 6.60 2.14e- 9
4 AGE value -3.10 1.33 -2.34 2.13e- 2
5 Gender (Intercept) 380. 17.7 21.5 7.77e-39
6 Gender value 30.5 22.1 1.38 1.70e- 1
7 Severity (Intercept) 459. 16.4 28.0 1.60e-48
8 Severity value -50.1 11.0 -4.55 1.54e- 5
9 Comorbid (Intercept) 423. 15.6 27.0 3.13e-47
10 Comorbid value -43.0 21.1 -2.04 4.43e- 2
Multiple regression output using MWT.mult <- tidy(model):
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 615. 116. 5.31 0.000000766
2 FEV1 31.3 27.0 1.16 0.251
3 AGE -3.24 1.26 -2.57 0.0117
4 copd$gender1 29.3 24.2 1.21 0.228
5 COPDSEVERITYMODERATE -25.9 29.0 -0.894 0.374
6 COPDSEVERITYSEVERE -42.7 42.4 -1.01 0.317
7 COPDSEVERITYVERY SEVERE -135. 60.6 -2.22 0.0289
8 comorbid1 -45.3 18.6 -2.44 0.0167
Problem 1: I lost some betas by coding COPDSEVERITY as an integer in the simple dataframe. Is there a way to have all 3 betas from the simple model show up on the simple dataframe that I created with the code that I used? I imagine the alternative would be to run the simple regression separately and manually merge the resulting output.
Output for lm(MWB1.Best ~ COPDSEVERITY)
(Intercept) COPDSEVERITYMODERATE COPDSEVERITYSEVERE
458.08696 -51.08696 -89.42029
COPDSEVERITYVERY SEVERE
-167.21196
Problem 2: Is there a package that creates combined simple and multiple linear regression outputs? To combine these dataframes I did the following:
MWT.simp <- filter(MWT.simp, term=="value") #remove all intercepts
MWT.simp <- MWT.simp %>% select(variable, estimate, std.error) #select appropriate columns
MWT.mult <- MWT.mult %>% select(term, estimate, std.error) #select appropriate columns
MWT.mult <- MWT.mult %>% rename("variable" = "term") #rename to prepare for merge
MWT.compare <- merge(x = MWT.simp, y = MWT.mult, by = "variable", all.x = TRUE)
Output:
variable estimate.x std.error.x estimate.y std.error.y
1 FEV1 74.110667 14.089604 31.254582 27.041458
2 AGE -3.104007 1.326155 -3.235664 1.257062
3 Gender 30.510417 22.097009 NA NA
4 Severity -50.130769 11.017792 NA NA
5 Comorbid -42.951515 21.084591 NA NA
Upon viewing my output I realize that the variables Gender and Comorbid also need to be renamed across the two datasets, and that I didn't address the COPDSEVERITY issue. Before I go further I thought there must be a better way of doing this, as this is such a common way of presenting data in journals.
Thanks!
</details>
# 答案1
**得分**: 2
I try to reduce your question and address to your core problems. Do 5 models and compare them. There are good packages that do the job.
# 5 linear regression models
ols1 <- lm(mpg ~ vs, data=mtcars)
ols2 <- lm(mpg ~ drat, data=mtcars)
ols3 <- lm(mpg ~ cyl, data=mtcars)
ols4 <- lm(mpg ~ disp, data=mtcars)
ols5 <- lm(mpg ~ vs + drat + cyl + disp, data=mtcars)
# model comparison
library(modelsummary)
modelsummary(list("simple 1" = ols1,
"simple 2" = ols2,
"simple 3" = ols3,
"simple 4" = ols4,
"multiple" = ols5))
<details>
<summary>英文:</summary>
I try to reduce your question and adress to your core problems. Do 5 models and compare them. There are good packages that do the job.
# 5 linear regression models
ols1 <- lm(mpg ~ vs, data=mtcars)
ols2 <- lm(mpg ~ drat, data=mtcars)
ols3 <- lm(mpg ~ cyl, data=mtcars)
ols4 <- lm(mpg ~ disp, data=mtcars)
ols5 <- lm(mpg ~ vs + drat + cyl + disp, data=mtcars)
# model comparison
library(modelsummary)
modelsummary(list("simple 1" = ols1,
"simple 2" = ols2,
"simple 3" = ols3,
"simple 4" = ols4,
"multiple" = ols5))
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/e5kD4.png
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论