2023年4月17日 04:41:37go评论88阅读模式

英文:

Creating a dataframe from output of several linear regression models

问题

I understand that you want the code parts to be left untranslated. Here's the translation for the non-code sections:

我正在运行5个简单的线性回归，然后进行1个包含了所有5个预测变量的多元线性回归。

我可以生成一个包含了来自这5个简单回归模型的所有beta系数的数据框，以及一个包含了多元回归模型调整后的beta系数的第二个数据框。我希望以最有效的方式将这些数据框合并在一起。我希望最终的产物看起来像这样：

系数              (简) 估计     (简) 标准误差    (多)调整估计    (多) 调整标准误差
FEV1              74.1         14.1               31.255            27.041        
AGE               -3.10        1.33                -3.236            1.257
等等。

以下是变量。对于所有的模型，MWT1Best 是结果变量：

str(copd2)
 $ 年龄         : 整数  77 79 80 56 65 67 67 83 72 75 ...
 $ COPD严重程度: 字符串  "轻度" "中度" "中度" "非常严重" ...
 $ MWT1Best    : 整数  120 176 201 210 210 216 237 237 237 240 ...
 $ FEV1        : 数值  1.21 1.09 1.52 0.47 1.07 1.09 0.69 0.68 2.13 1.06 ...
 $ 性别         : 整数  1 0 0 1 1 0 0 1 1 0 ...
 $ 共病2        : 整数  1 1 1 1 1 1 1 1 1 1 ...

5个简单线性回归模型的代码：

copd2$COPDSEVERITY <- recode(copd2$COPDSEVERITY, "轻度" = 0, "中度" = 1, "严重" = 2, "非常严重" = 3)
f.MWT <- melt(data.frame(x = copd2$MWT1Best,
                    FEV1=copd2$FEV1,
                    年龄=copd2$AGE,
                    性别=copd2$gender,
                    严重程度=copd2$COPDSEVERITY,
                    共病=copd2$共病2),
               id.vars = "x")
MWT.simp <- mergedf.MWT %>% group_by(变量) %>% do(tidy(lm(x ~ value, data = .)))

简单线性回归的输出：

  变量      term        估计      标准误差      统计量     p值
1 FEV1   (截距)   280.       24.6         11.4   1.15e-19
2 FEV1   value          74.1      14.1          5.26  8.47e- 7
3 年龄    (截距)   616.       93.4          6.60  2.14e- 9
4 年龄    value         -3.10      1.33        -2.34  2.13e- 2
5 性别    (截距)   380.       17.7         21.5   7.77e-39
6 性别    value         30.5      22.1          1.38  1.70e- 1
7 严重程度 (截距)   459.       16.4         28.0   1.60e-48
8 严重程度 value       -50.1      11.0         -4.55  1.54e- 5
9 共病      (截距)   423.       15.6         27.0   3.13e-47
10 共病     value         -43.0      21.1        -2.04  4.43e- 2

使用 MWT.mult <- tidy(model) 的多元回归输出：

  term                        估计     标准误差    统计量       p值
1 (截距)                     615.     116.        5.31  0.000000766
2 FEV1                         31.3      27.0       1.16  0.251      
3 年龄                          -3.24      1.26     -2.57  0.0117     
4 copd$gender1             29.3      24.2       1.21  0.228      
5 COPD严重程度中度          -25.9      29.0      -0.894 0.374      
6 COPD严重程度严重         -42.7      42.4      -1.01  0.317      
7 COPD严重程度非常严重   -135.      60.6      -2.22  0.0289     
8 共病1                          -45.3      18.6      -2.44  0.0167

问题1：我通过将 COPD 严重程度编码为简单数据框中的整数，丧失了一些beta系数。是否有一种方式可以让简单模型的所有3个beta系数显示在我使用的代码创建的简单数据框中？我想象中的替代方式可能是单独运行简单回归，然后手动合并产生的输出。

问题2：是否有一个包可以创建结合了简单和多元线性回归输出的工具？为了合并这些数据框，我做了以下操作：

MWT.simp <- filter(MWT.simp, term=="
<details>
<summary>英文:</summary>
I am running 5 simple linear regressions, then 1 multiple linear regression with all 5 predictors.
I can produce a dataframe with all betas from the 5 simple regression models, and a second dataframe with the adjusted betas from the multiple regression model. I would like to combine these dataframes in the most efficient way possible. I want the final product to look something like this:

Coefficient (Simp) Est. (Simp) Std. Error (Mult)Adj. Est. (Mult) Adj. Std. Error
FEV1 74.1 14.1 31.255 27.041.
AGE -3.10 1.33 -3.236 1.257
etc.


Here are the variables. For all models, MWT1Best is the outcome variable:

str(copd2)
$ AGE : int 77 79 80 56 65 67 67 83 72 75 ...
$ COPDSEVERITY: chr "SEVERE" "MODERATE" "MODERATE" "VERY SEVERE" ...
$ MWT1Best : int 120 176 201 210 210 216 237 237 237 240 ...
$ FEV1 : num 1.21 1.09 1.52 0.47 1.07 1.09 0.69 0.68 2.13 1.06 ...
$ gender : int 1 0 0 1 1 0 0 1 1 0 ...
$ comorbid2 : int 1 1 1 1 1 1 1 1 1 1 ...


5 Simple linear regression models code:

copd2$COPDSEVERITY <- recode(copd2$COPDSEVERITY, "MILD" = 0, "MODERATE" = 1, "SEVERE" = 2, "VERY SEVERE" = 3)

f.MWT <- melt(data.frame(x = copd2$MWT1Best,
FEV1=copd2$FEV1,
AGE=copd2$AGE,
Gender=copd2$gender
Severity=copd2$COPDSEVERITY,
Comorbid=copd2$comorbid2),
id.vars = "x")

MWT.simp - mergedf.MWT %>% group_by(variable) %>% do(tidy(lm(x ~ value, data = .)))


Simple linear regression output:

variable term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 FEV1 (Intercept) 280. 24.6 11.4 1.15e-19
2 FEV1 value 74.1 14.1 5.26 8.47e- 7
3 AGE (Intercept) 616. 93.4 6.60 2.14e- 9
4 AGE value -3.10 1.33 -2.34 2.13e- 2
5 Gender (Intercept) 380. 17.7 21.5 7.77e-39
6 Gender value 30.5 22.1 1.38 1.70e- 1
7 Severity (Intercept) 459. 16.4 28.0 1.60e-48
8 Severity value -50.1 11.0 -4.55 1.54e- 5
9 Comorbid (Intercept) 423. 15.6 27.0 3.13e-47
10 Comorbid value -43.0 21.1 -2.04 4.43e- 2


Multiple regression output using MWT.mult &lt;- tidy(model):

term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 615. 116. 5.31 0.000000766
2 FEV1 31.3 27.0 1.16 0.251
3 AGE -3.24 1.26 -2.57 0.0117
4 copd$gender1 29.3 24.2 1.21 0.228
5 COPDSEVERITYMODERATE -25.9 29.0 -0.894 0.374
6 COPDSEVERITYSEVERE -42.7 42.4 -1.01 0.317
7 COPDSEVERITYVERY SEVERE -135. 60.6 -2.22 0.0289
8 comorbid1 -45.3 18.6 -2.44 0.0167


Problem 1: I lost some betas by coding COPDSEVERITY as an integer in the simple dataframe. Is there a way to have all 3 betas from the simple model show up on the simple dataframe that I created with the code that I used? I imagine the alternative would be to run the simple regression separately and manually merge the resulting output.
Output for lm(MWB1.Best ~ COPDSEVERITY)

        (Intercept)    COPDSEVERITYMODERATE      COPDSEVERITYSEVERE 
          458.08696               -51.08696               -89.42029

COPDSEVERITYVERY SEVERE
-167.21196


Problem 2: Is there a package that creates combined simple and multiple linear regression outputs? To combine these dataframes I did the following:

MWT.simp <- filter(MWT.simp, term=="value") #remove all intercepts
MWT.simp <- MWT.simp %>% select(variable, estimate, std.error) #select appropriate columns
MWT.mult <- MWT.mult %>% select(term, estimate, std.error) #select appropriate columns
MWT.mult <- MWT.mult %>% rename("variable" = "term") #rename to prepare for merge
MWT.compare <- merge(x = MWT.simp, y = MWT.mult, by = "variable", all.x = TRUE)


Output:

variable estimate.x std.error.x estimate.y std.error.y
1 FEV1 74.110667 14.089604 31.254582 27.041458
2 AGE -3.104007 1.326155 -3.235664 1.257062
3 Gender 30.510417 22.097009 NA NA
4 Severity -50.130769 11.017792 NA NA
5 Comorbid -42.951515 21.084591 NA NA


Upon viewing my output I realize that the variables Gender and Comorbid also need to be renamed across the two datasets, and that I didn&#39;t address the COPDSEVERITY issue. Before I go further I thought there must be a better way of doing this, as this is such a common way of presenting data in journals.
Thanks!
</details>
# 答案1
**得分**: 2
I try to reduce your question and address to your core problems. Do 5 models and compare them. There are good packages that do the job.
# 5 linear regression models
ols1 <- lm(mpg ~ vs, data=mtcars)
ols2 <- lm(mpg ~ drat, data=mtcars)
ols3 <- lm(mpg ~ cyl, data=mtcars)
ols4 <- lm(mpg ~ disp, data=mtcars)
ols5 <- lm(mpg ~ vs + drat + cyl + disp, data=mtcars)
# model comparison
library(modelsummary)
modelsummary(list("simple 1" = ols1,
                  "simple 2" = ols2, 
                  "simple 3" = ols3, 
                  "simple 4" = ols4,
                  "multiple" = ols5))
<details>
<summary>英文:</summary>
I try to reduce your question and adress to your core problems. Do 5 models and compare them. There are good packages that do the job.
    # 5 linear regression models 
    
    ols1 &lt;- lm(mpg ~ vs, data=mtcars)
    ols2 &lt;- lm(mpg ~ drat, data=mtcars)
    ols3 &lt;- lm(mpg ~ cyl, data=mtcars)
    ols4 &lt;- lm(mpg ~ disp, data=mtcars)
    ols5 &lt;- lm(mpg ~ vs + drat + cyl + disp, data=mtcars)
    
    # model comparison 
    
    library(modelsummary)
    modelsummary(list(&quot;simple 1&quot; = ols1,
                      &quot;simple 2&quot; = ols2, 
                      &quot;simple 3&quot; = ols3, 
                      &quot;simple 4&quot; = ols4,
                      &quot;multiple&quot; = ols5)) 
[![enter image description here][1]][1]
  [1]: https://i.stack.imgur.com/e5kD4.png
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从多个线性回归模型的输出创建数据框。

问题

如何将R包安装到Ubuntu的Docker容器中。

R Shiny复选框不会随observeEvent更新。

最新/截至2023年的世界地图

对我的向量到列的代码进行优化

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。