2023年8月9日 00:52:13go评论99阅读模式

英文:

linear model not producing slope or r squared when independent and dependent variable are the same

问题

我有一个数据框，并正在运行线性回归。当将同一变量用作自变量和因变量时，线性模型的摘要没有返回预期的斜率和R平方值为1，而只提供了模型的截距。为什么当自变量和因变量相同时，不返回斜率和R平方为1呢？

aa <- data.frame(x = rnorm(10, 100, 5),
                 y = rnorm(10,500, 2))
lm_mod1 <- lm(y~x, data = aa)
summary(lm_mod1) # 正常工作，返回斜率和R平方值
#> 
#> Call:
#> lm(formula = y ~ x, data = aa)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.0241 -1.3874  0.5264  1.7933  2.2946 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 489.7413    14.2402  34.391 5.59e-10 ***
#> x             0.1008     0.1428   0.706      0.5    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.491 on 8 degrees of freedom
#> Multiple R-squared:  0.05862,    Adjusted R-squared:  -0.05905 
#> F-statistic: 0.4982 on 1 and 8 DF,  p-value: 0.5003
lm_mod2 <- lm(x~x, data = aa)
#> Warning in model.matrix.default(mt, mf, contrasts): the response appeared on the
#> right-hand side and was dropped
#> Warning in model.matrix.default(mt, mf, contrasts): problem with term 1 in
#> model.matrix: no columns are assigned
summary(lm_mod2) # 不返回斜率和R平方值
#> 
#> Call:
#> lm(formula = x ~ x, data = aa)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -10.6993  -2.9903  -0.6496   3.2495   8.4294 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   99.554      1.838   54.16 1.25e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 5.813 on 9 degrees of freedom

^{创建于2023-08-08，使用reprex v2.0.2}

英文:

I have a data frame and am running a linear regression. When the same variable is used as an independent and dependent variable, the summary of the linear model does not return the expected slope and r-squared values of 1. Rather only the y-intercept of the model is provided. Why is a slope and r-squared of 1 not returned when the independent and dependent variables are the same?

aa &lt;- data.frame(x = rnorm(10, 100, 5),
                 y = rnorm(10,500, 2))
lm_mod1 &lt;- lm(y~x, data = aa)
summary(lm_mod1) # works as it should, returning a slope and r-squared value
#&gt; 
#&gt; Call:
#&gt; lm(formula = y ~ x, data = aa)
#&gt; 
#&gt; Residuals:
#&gt;     Min      1Q  Median      3Q     Max 
#&gt; -4.0241 -1.3874  0.5264  1.7933  2.2946 
#&gt; 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept) 489.7413    14.2402  34.391 5.59e-10 ***
#&gt; x             0.1008     0.1428   0.706      0.5    
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 2.491 on 8 degrees of freedom
#&gt; Multiple R-squared:  0.05862,    Adjusted R-squared:  -0.05905 
#&gt; F-statistic: 0.4982 on 1 and 8 DF,  p-value: 0.5003
lm_mod2 &lt;- lm(x~x, data = aa)
#&gt; Warning in model.matrix.default(mt, mf, contrasts): the response appeared on the
#&gt; right-hand side and was dropped
#&gt; Warning in model.matrix.default(mt, mf, contrasts): problem with term 1 in
#&gt; model.matrix: no columns are assigned
summary(lm_mod2) # does not return a slope or r-squared value
#&gt; 
#&gt; Call:
#&gt; lm(formula = x ~ x, data = aa)
#&gt; 
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -10.6993  -2.9903  -0.6496   3.2495   8.4294 
#&gt; 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)   99.554      1.838   54.16 1.25e-12 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 5.813 on 9 degrees of freedom

<sup>Created on 2023-08-08 with reprex v2.0.2</sup>

答案1

得分: 2

在警告中告诉你：
> 响应出现在右侧并被删除

所以你的公式实际上变成了 x ~ 1（只是 x 的均值估计）。

这是有意为之的。如果你想规避这个问题，你可以这样做：

aa$z <- aa$x
summary(lm(x ~ z, data = aa))
#> Call:
#> lm(formula = x ~ z, data = aa)
#> 
#> Coefficients:
#> (Intercept)            z  
#>  -1.798e-14    1.000e+00  
#> 
#> > summary(lm(x ~ z, data = aa))
#> 
#> Call:
#> lm(formula = x ~ z, data = aa)
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -1.582e-15 -4.789e-16 -2.258e-16  6.682e-16  1.553e-15 
#> 
#> Coefficients:
#>               Estimate Std. Error    t value Pr(>|t|)    
#> (Intercept) -1.798e-14  6.334e-15 -2.838e+00   0.0219 *  
#> z            1.000e+00  6.534e-17  1.530e+16   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 9.623e-16 on 8 degrees of freedom
#> Multiple R-squared:      1,	Adjusted R-squared:      1 
#> F-statistic: 2.342e+32 on 1 and 8 DF,  p-value: < 2.2e-16
#> 
#> Warning message:
#> In summary.lm(lm(x ~ z, data = aa)) :
#>   essentially perfect fit: summary may be unreliable

你会看到确实得到了斜率为1和R平方为1的结果，同时还有一个警告，表示拟合非常完美，因此摘要可能不可靠。

这是一种特性，而不是错误；右侧同时也在左侧的变量会被主动寻找并删除，并且不清楚为什么你要这样做。

好奇吗？

英文:

It tells you in the warning :
> the response appeared on the right hand side and was dropped

so your formula effectively becomes x ~ 1 (an estimate of the mean of x only).

This is done on purpose. If you want to circumvent it, you can do

aa$z &lt;- aa$x
summary(lm(x ~ z, data = aa))
#&gt; Call:
#&gt; lm(formula = x ~ z, data = aa)
#&gt; 
#&gt; Coefficients:
#&gt; (Intercept)            z  
#&gt;  -1.798e-14    1.000e+00  
#&gt; 
#&gt; &gt; summary(lm(x ~ z, data = aa))
#&gt; 
#&gt; Call:
#&gt; lm(formula = x ~ z, data = aa)
#&gt; 
#&gt; Residuals:
#&gt;        Min         1Q     Median         3Q        Max 
#&gt; -1.582e-15 -4.789e-16 -2.258e-16  6.682e-16  1.553e-15 
#&gt; 
#&gt; Coefficients:
#&gt;               Estimate Std. Error    t value Pr(&gt;|t|)    
#&gt; (Intercept) -1.798e-14  6.334e-15 -2.838e+00   0.0219 *  
#&gt; z            1.000e+00  6.534e-17  1.530e+16   &lt;2e-16 ***
#&gt; ---
#&gt; Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#&gt; 
#&gt; Residual standard error: 9.623e-16 on 8 degrees of freedom
#&gt; Multiple R-squared:      1,	Adjusted R-squared:      1 
#&gt; F-statistic: 2.342e+32 on 1 and 8 DF,  p-value: &lt; 2.2e-16
#&gt; 
#&gt; Warning message:
#&gt; In summary.lm(lm(x ~ z, data = aa)) :
#&gt;   essentially perfect fit: summary may be unreliable

You will see you do indeed get a slope of 1 and an r squared of 1, along with a warning that the fit is perfect and the summary may therefore be unreliable.

This is a feature, not a bug; variables on the right hand side that are also on the left are actively sought and dropped, and it's not clear why you would want to do this anyway.

Curiosity?

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

当自变量和因变量相同时，线性模型不会产生斜率或R平方值。

问题

答案1

在R中基于另一张表格和两个条件创建新列。

保留 ggplot 中 facet_grid 后的 Scale_x_discrete 刻度标签

Shiny互动组件在Flexdashboard中未加载。

在R中删除字符串中的特定数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。