英文:
Different intercept values for linear regression using statsmodels and numpy polyfit
问题
我从使用statsmodels回归拟合和numpy polyfit获得了两个不同的截距值。该模型是一个简单的线性回归,只有一个变量。
从statsmodels回归中,我使用以下代码:
results1 = smf.ols('np.log(NON_UND) ~ (np.log(Food_consumption))', data=Data2).fit()
在那里,我得到以下结果:
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------------
Intercept 5.4433 0.270 20.154 0.000 4.911 5.976
np.log(Food_consumption) 1.1128 0.026 42.922 0.000 1.062 1.164
当绘制数据并使用numpy polyfit添加趋势线时,我得到不同的截距值:
x = np.array((np.log(Data2.Food_consumption)))
y = np.array((np.log(Data2.NON_UND)*100))
z = np.polyfit(x, y, 1)
array([ 1.11278898, 10.04846693])
为什么我会得到两个不同的截距值呢?
提前感谢!
英文:
I get two different intercept values from using the statsmodels regression fit and the numpy polyfit. The model is a simple linear regression with a single variable.
From the statsmodels regression I use:
results1 = smf.ols('np.log(NON_UND) ~ (np.log(Food_consumption))', data=Data2).fit()
Where I recieve the following results:
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------------
Intercept 5.4433 0.270 20.154 0.000 4.911 5.976
np.log(Food_consumption) 1.1128 0.026 42.922 0.000 1.062 1.164
When plotting the data and adding a trendline using numpy polyfit, I recieve a different intercept value:
x = np.array((np.log(Data2.Food_consumption)))
y = np.array((np.log(Data2.NON_UND)*100))
z = np.polyfit(x, y, 1)
array([ 1.11278898, 10.04846693])
How come I get two different values for the intercept?
Thanks in advance!
答案1
得分: 1
这是因为您在第一次和第二次回归中使用了不同的线性模型。在第一次回归中,您对因变量和自变量都取了对数,而在第二次回归中,您没有这样做,另外,您还将y乘以了100。
为了使第二个规范的结果与第一个回归相同,您需要确保回归模型与第一个回归完全相同。我建议您这样做:
x = np.log(np.array(((Data2.Food_consumption))))
y = np.log(np.array(((Data2.NON_UND))))
z = np.polyfit(x, y, 1)
然后,使用第二个函数得到的输出应该与第一个函数得到的输出相同。
英文:
This is because you are using different linear models in the first and second regressions. In the first regression, you take logs of both the dependent and independent variables, while in the second regression, you are not, and additionally, you are multiplying y by 100.
In order to get the same results as the first regression in the second specification, you need to make sure the regression model is exactly the same as the first one. I suggest you do this:
x = np.log(np.array(((Data2.Food_consumption))))
y = np.log(np.array(((Data2.NON_UND))))
z = np.polyfit(x, y, 1)
And then the output you get with the second function should be the same as the one you get in the first one.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论