
huangapple go评论78阅读模式

linear model not producing slope or r squared when independent and dependent variable are the same



aa <- data.frame(x = rnorm(10, 100, 5),
                 y = rnorm(10,500, 2))

lm_mod1 <- lm(y~x, data = aa)
summary(lm_mod1) # 正常工作,返回斜率和R平方值
#> Call:
#> lm(formula = y ~ x, data = aa)
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.0241 -1.3874  0.5264  1.7933  2.2946 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 489.7413    14.2402  34.391 5.59e-10 ***
#> x             0.1008     0.1428   0.706      0.5    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Residual standard error: 2.491 on 8 degrees of freedom
#> Multiple R-squared:  0.05862,    Adjusted R-squared:  -0.05905 
#> F-statistic: 0.4982 on 1 and 8 DF,  p-value: 0.5003

lm_mod2 <- lm(x~x, data = aa)
#> Warning in model.matrix.default(mt, mf, contrasts): the response appeared on the
#> right-hand side and was dropped
#> Warning in model.matrix.default(mt, mf, contrasts): problem with term 1 in
#> model.matrix: no columns are assigned
summary(lm_mod2) # 不返回斜率和R平方值
#> Call:
#> lm(formula = x ~ x, data = aa)
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -10.6993  -2.9903  -0.6496   3.2495   8.4294 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   99.554      1.838   54.16 1.25e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Residual standard error: 5.813 on 9 degrees of freedom

创建于2023-08-08,使用reprex v2.0.2


I have a data frame and am running a linear regression. When the same variable is used as an independent and dependent variable, the summary of the linear model does not return the expected slope and r-squared values of 1. Rather only the y-intercept of the model is provided. Why is a slope and r-squared of 1 not returned when the independent and dependent variables are the same?

aa &lt;- data.frame(x = rnorm(10, 100, 5),
                 y = rnorm(10,500, 2))

lm_mod1 &lt;- lm(y~x, data = aa)
summary(lm_mod1) # works as it should, returning a slope and r-squared value
#&gt; Call:
#&gt; lm(formula = y ~ x, data = aa)
#&gt; Residuals:
#&gt;     Min      1Q  Median      3Q     Max 
#&gt; -4.0241 -1.3874  0.5264  1.7933  2.2946 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept) 489.7413    14.2402  34.391 5.59e-10 ***
#&gt; x             0.1008     0.1428   0.706      0.5    
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; Residual standard error: 2.491 on 8 degrees of freedom
#&gt; Multiple R-squared:  0.05862,    Adjusted R-squared:  -0.05905 
#&gt; F-statistic: 0.4982 on 1 and 8 DF,  p-value: 0.5003

lm_mod2 &lt;- lm(x~x, data = aa)
#&gt; Warning in model.matrix.default(mt, mf, contrasts): the response appeared on the
#&gt; right-hand side and was dropped
#&gt; Warning in model.matrix.default(mt, mf, contrasts): problem with term 1 in
#&gt; model.matrix: no columns are assigned
summary(lm_mod2) # does not return a slope or r-squared value
#&gt; Call:
#&gt; lm(formula = x ~ x, data = aa)
#&gt; Residuals:
#&gt;      Min       1Q   Median       3Q      Max 
#&gt; -10.6993  -2.9903  -0.6496   3.2495   8.4294 
#&gt; Coefficients:
#&gt;             Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)   99.554      1.838   54.16 1.25e-12 ***
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; Residual standard error: 5.813 on 9 degrees of freedom

<sup>Created on 2023-08-08 with reprex v2.0.2</sup>


得分: 2

> 响应出现在右侧并被删除

所以你的公式实际上变成了 x ~ 1(只是 x 的均值估计)。


aa$z <- aa$x

summary(lm(x ~ z, data = aa))
#> Call:
#> lm(formula = x ~ z, data = aa)
#> Coefficients:
#> (Intercept)            z  
#>  -1.798e-14    1.000e+00  
#> > summary(lm(x ~ z, data = aa))
#> Call:
#> lm(formula = x ~ z, data = aa)
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -1.582e-15 -4.789e-16 -2.258e-16  6.682e-16  1.553e-15 
#> Coefficients:
#>               Estimate Std. Error    t value Pr(>|t|)    
#> (Intercept) -1.798e-14  6.334e-15 -2.838e+00   0.0219 *  
#> z            1.000e+00  6.534e-17  1.530e+16   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> Residual standard error: 9.623e-16 on 8 degrees of freedom
#> Multiple R-squared:      1,	Adjusted R-squared:      1 
#> F-statistic: 2.342e+32 on 1 and 8 DF,  p-value: < 2.2e-16
#> Warning message:
#> In summary.lm(lm(x ~ z, data = aa)) :
#>   essentially perfect fit: summary may be unreliable





It tells you in the warning :
> the response appeared on the right hand side and was dropped

so your formula effectively becomes x ~ 1 (an estimate of the mean of x only).

This is done on purpose. If you want to circumvent it, you can do

aa$z &lt;- aa$x

summary(lm(x ~ z, data = aa))
#&gt; Call:
#&gt; lm(formula = x ~ z, data = aa)
#&gt; Coefficients:
#&gt; (Intercept)            z  
#&gt;  -1.798e-14    1.000e+00  
#&gt; &gt; summary(lm(x ~ z, data = aa))
#&gt; Call:
#&gt; lm(formula = x ~ z, data = aa)
#&gt; Residuals:
#&gt;        Min         1Q     Median         3Q        Max 
#&gt; -1.582e-15 -4.789e-16 -2.258e-16  6.682e-16  1.553e-15 
#&gt; Coefficients:
#&gt;               Estimate Std. Error    t value Pr(&gt;|t|)    
#&gt; (Intercept) -1.798e-14  6.334e-15 -2.838e+00   0.0219 *  
#&gt; z            1.000e+00  6.534e-17  1.530e+16   &lt;2e-16 ***
#&gt; ---
#&gt; Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#&gt; Residual standard error: 9.623e-16 on 8 degrees of freedom
#&gt; Multiple R-squared:      1,	Adjusted R-squared:      1 
#&gt; F-statistic: 2.342e+32 on 1 and 8 DF,  p-value: &lt; 2.2e-16
#&gt; Warning message:
#&gt; In summary.lm(lm(x ~ z, data = aa)) :
#&gt;   essentially perfect fit: summary may be unreliable

You will see you do indeed get a slope of 1 and an r squared of 1, along with a warning that the fit is perfect and the summary may therefore be unreliable.

This is a feature, not a bug; variables on the right hand side that are also on the left are actively sought and dropped, and it's not clear why you would want to do this anyway.


  • 本文由 发表于 2023年8月9日 00:52:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861671.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
