使用statsmodels和numpy polyfit进行线性回归时的不同截距值

huangapple go评论64阅读模式
英文:

Different intercept values for linear regression using statsmodels and numpy polyfit

问题

我从使用statsmodels回归拟合和numpy polyfit获得了两个不同的截距值。该模型是一个简单的线性回归,只有一个变量。

从statsmodels回归中,我使用以下代码:

results1 = smf.ols('np.log(NON_UND) ~ (np.log(Food_consumption))', data=Data2).fit()

在那里,我得到以下结果:

                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept                    5.4433      0.270     20.154      0.000       4.911       5.976
np.log(Food_consumption)     1.1128      0.026     42.922      0.000       1.062       1.164

当绘制数据并使用numpy polyfit添加趋势线时,我得到不同的截距值:

x = np.array((np.log(Data2.Food_consumption)))
y = np.array((np.log(Data2.NON_UND)*100))

z = np.polyfit(x, y, 1)

array([ 1.11278898, 10.04846693])

为什么我会得到两个不同的截距值呢?

提前感谢!

英文:

I get two different intercept values from using the statsmodels regression fit and the numpy polyfit. The model is a simple linear regression with a single variable.

From the statsmodels regression I use:

results1 = smf.ols('np.log(NON_UND) ~ (np.log(Food_consumption))', data=Data2).fit()

Where I recieve the following results:

                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept                    5.4433      0.270     20.154      0.000       4.911       5.976
np.log(Food_consumption)     1.1128      0.026     42.922      0.000       1.062       1.164

When plotting the data and adding a trendline using numpy polyfit, I recieve a different intercept value:

x = np.array((np.log(Data2.Food_consumption)))
y = np.array((np.log(Data2.NON_UND)*100))

z = np.polyfit(x, y, 1)

array([ 1.11278898, 10.04846693])

How come I get two different values for the intercept?

Thanks in advance!

答案1

得分: 1

这是因为您在第一次和第二次回归中使用了不同的线性模型。在第一次回归中,您对因变量和自变量都取了对数,而在第二次回归中,您没有这样做,另外,您还将y乘以了100。

为了使第二个规范的结果与第一个回归相同,您需要确保回归模型与第一个回归完全相同。我建议您这样做:

x = np.log(np.array(((Data2.Food_consumption))))
y = np.log(np.array(((Data2.NON_UND))))

z = np.polyfit(x, y, 1)

然后,使用第二个函数得到的输出应该与第一个函数得到的输出相同。

英文:

This is because you are using different linear models in the first and second regressions. In the first regression, you take logs of both the dependent and independent variables, while in the second regression, you are not, and additionally, you are multiplying y by 100.

In order to get the same results as the first regression in the second specification, you need to make sure the regression model is exactly the same as the first one. I suggest you do this:

x = np.log(np.array(((Data2.Food_consumption))))
y = np.log(np.array(((Data2.NON_UND))))

z = np.polyfit(x, y, 1)

And then the output you get with the second function should be the same as the one you get in the first one.

huangapple
  • 本文由 发表于 2023年6月1日 18:56:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381180.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定