英文:
linear model not producing slope or r squared when independent and dependent variable are the same
问题
我有一个数据框,并正在运行线性回归。当将同一变量用作自变量和因变量时,线性模型的摘要没有返回预期的斜率和R平方值为1,而只提供了模型的截距。为什么当自变量和因变量相同时,不返回斜率和R平方为1呢?
aa <- data.frame(x = rnorm(10, 100, 5),
y = rnorm(10,500, 2))
lm_mod1 <- lm(y~x, data = aa)
summary(lm_mod1) # 正常工作,返回斜率和R平方值
#>
#> Call:
#> lm(formula = y ~ x, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.0241 -1.3874 0.5264 1.7933 2.2946
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 489.7413 14.2402 34.391 5.59e-10 ***
#> x 0.1008 0.1428 0.706 0.5
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.491 on 8 degrees of freedom
#> Multiple R-squared: 0.05862, Adjusted R-squared: -0.05905
#> F-statistic: 0.4982 on 1 and 8 DF, p-value: 0.5003
lm_mod2 <- lm(x~x, data = aa)
#> Warning in model.matrix.default(mt, mf, contrasts): the response appeared on the
#> right-hand side and was dropped
#> Warning in model.matrix.default(mt, mf, contrasts): problem with term 1 in
#> model.matrix: no columns are assigned
summary(lm_mod2) # 不返回斜率和R平方值
#>
#> Call:
#> lm(formula = x ~ x, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -10.6993 -2.9903 -0.6496 3.2495 8.4294
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 99.554 1.838 54.16 1.25e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 5.813 on 9 degrees of freedom
创建于2023-08-08,使用reprex v2.0.2
英文:
I have a data frame and am running a linear regression. When the same variable is used as an independent and dependent variable, the summary of the linear model does not return the expected slope and r-squared values of 1. Rather only the y-intercept of the model is provided. Why is a slope and r-squared of 1 not returned when the independent and dependent variables are the same?
aa <- data.frame(x = rnorm(10, 100, 5),
y = rnorm(10,500, 2))
lm_mod1 <- lm(y~x, data = aa)
summary(lm_mod1) # works as it should, returning a slope and r-squared value
#>
#> Call:
#> lm(formula = y ~ x, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.0241 -1.3874 0.5264 1.7933 2.2946
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 489.7413 14.2402 34.391 5.59e-10 ***
#> x 0.1008 0.1428 0.706 0.5
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.491 on 8 degrees of freedom
#> Multiple R-squared: 0.05862, Adjusted R-squared: -0.05905
#> F-statistic: 0.4982 on 1 and 8 DF, p-value: 0.5003
lm_mod2 <- lm(x~x, data = aa)
#> Warning in model.matrix.default(mt, mf, contrasts): the response appeared on the
#> right-hand side and was dropped
#> Warning in model.matrix.default(mt, mf, contrasts): problem with term 1 in
#> model.matrix: no columns are assigned
summary(lm_mod2) # does not return a slope or r-squared value
#>
#> Call:
#> lm(formula = x ~ x, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -10.6993 -2.9903 -0.6496 3.2495 8.4294
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 99.554 1.838 54.16 1.25e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 5.813 on 9 degrees of freedom
<sup>Created on 2023-08-08 with reprex v2.0.2</sup>
答案1
得分: 2
在警告中告诉你:
> 响应出现在右侧并被删除
所以你的公式实际上变成了 x ~ 1
(只是 x 的均值估计)。
这是有意为之的。如果你想规避这个问题,你可以这样做:
aa$z <- aa$x
summary(lm(x ~ z, data = aa))
#> Call:
#> lm(formula = x ~ z, data = aa)
#>
#> Coefficients:
#> (Intercept) z
#> -1.798e-14 1.000e+00
#>
#> > summary(lm(x ~ z, data = aa))
#>
#> Call:
#> lm(formula = x ~ z, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.582e-15 -4.789e-16 -2.258e-16 6.682e-16 1.553e-15
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.798e-14 6.334e-15 -2.838e+00 0.0219 *
#> z 1.000e+00 6.534e-17 1.530e+16 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 9.623e-16 on 8 degrees of freedom
#> Multiple R-squared: 1, Adjusted R-squared: 1
#> F-statistic: 2.342e+32 on 1 and 8 DF, p-value: < 2.2e-16
#>
#> Warning message:
#> In summary.lm(lm(x ~ z, data = aa)) :
#> essentially perfect fit: summary may be unreliable
你会看到确实得到了斜率为1和R平方为1的结果,同时还有一个警告,表示拟合非常完美,因此摘要可能不可靠。
这是一种特性,而不是错误;右侧同时也在左侧的变量会被主动寻找并删除,并且不清楚为什么你要这样做。
好奇吗?
英文:
It tells you in the warning :
> the response appeared on the right hand side and was dropped
so your formula effectively becomes x ~ 1
(an estimate of the mean of x only).
This is done on purpose. If you want to circumvent it, you can do
aa$z <- aa$x
summary(lm(x ~ z, data = aa))
#> Call:
#> lm(formula = x ~ z, data = aa)
#>
#> Coefficients:
#> (Intercept) z
#> -1.798e-14 1.000e+00
#>
#> > summary(lm(x ~ z, data = aa))
#>
#> Call:
#> lm(formula = x ~ z, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.582e-15 -4.789e-16 -2.258e-16 6.682e-16 1.553e-15
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.798e-14 6.334e-15 -2.838e+00 0.0219 *
#> z 1.000e+00 6.534e-17 1.530e+16 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 9.623e-16 on 8 degrees of freedom
#> Multiple R-squared: 1, Adjusted R-squared: 1
#> F-statistic: 2.342e+32 on 1 and 8 DF, p-value: < 2.2e-16
#>
#> Warning message:
#> In summary.lm(lm(x ~ z, data = aa)) :
#> essentially perfect fit: summary may be unreliable
You will see you do indeed get a slope of 1 and an r squared of 1, along with a warning that the fit is perfect and the summary may therefore be unreliable.
This is a feature, not a bug; variables on the right hand side that are also on the left are actively sought and dropped, and it's not clear why you would want to do this anyway.
Curiosity?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论