英文:
Statsmodels ARIMA (0,1,2) result different from Stata ARIMA(0,1,2)
问题
在进行ARIMA分析时,从Stata 17和statsmodels的输出结果不同。
当我应用以下代码时:
re = ARIMA(df_log, order = (0,1,2))
print(re.fit().summary())
结果如下:
SARIMAX Results
Dep. Variable: GDP No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 48.459
Date: Thu, 11 May 2023 AIC -90.918
Time: 02:21:08 BIC -84.585
Sample: 01-01-1960 HQIC -88.436
- 01-01-2021
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ma.L1 0.4751 0.142 3.349 0.001 0.197 0.753
ma.L2 -0.0500 0.151 -0.332 0.740 -0.345 0.245
sigma2 0.0119 0.002 6.720 0.000 0.008 0.015
Ljung-Box (L1) (Q): 4.11 Jarque-Bera (JB): 3.62
Prob(Q): 0.04 Prob(JB): 0.16
Heteroskedasticity (H): 0.60 Skew: 0.37
Prob(H) (two-sided): 0.27 Kurtosis: 3.94
然而,在Stata 17中执行相同方法时,相同数据的结果如下:
arima log_gdp, arima(0,1,2)
(setting optimization to BHHH)
Iteration 0: log likelihood = 51.833406
Iteration 1: log likelihood = 58.219464
Iteration 2: log likelihood = 59.750732
Iteration 3: log likelihood = 60.128641
Iteration 4: log likelihood = 60.183567
(switching optimization to BFGS)
Iteration 5: log likelihood = 60.191613
Iteration 6: log likelihood = 60.192693
Iteration 7: log likelihood = 60.192721
Iteration 8: log likelihood = 60.192721
ARIMA regression
Sample: 1961 thru 2021 Number of obs = 61
Wald chi2(2) = 17.21
Log likelihood = 60.19272 Prob > chi2 = 0.0002
------------------------------------------------------------------------------
| OPG
D.log_gdp | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
log_gdp |
_cons | .0707899 .0085723 8.26 0.000 .0539885 .0875912
-------------+----------------------------------------------------------------
ARMA |
ma |
L1. | .1135653 .103465 1.10 0.272 -.0892223 .3163529
L2. | -.4008123 .1129969 -3.55 0.000 -.6222821 -.1793425
-------------+----------------------------------------------------------------
/sigma | .0899162 .007283 12.35 0.000 .0756417 .1041907
------------------------------------------------------------------------------
Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
结果不同。因此,我想知道是否漏掉了什么解释。然而,如果我在statsmodels中使用一阶差分数据,但模型为ARIMA(0,0,2),结果是匹配的。在这里我使用的是statsmodels版本0.13.5
re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
print(re.fit().summary())
结果如下:
SARIMAX Results
Dep. Variable: GDP No. Observations: 61
Model: ARIMA(0, 0, 2) Log Likelihood 60.193
Date: Thu, 11 May 2023 AIC -112.386
Time: 02:14:30 BIC -103.942
Sample: 01-01-1961 HQIC -109.076
- 01-01-2021
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
const 0.0708 0.009 8.258 0.000 0.054 0.088
ma.L1 0.1136 0.103 1.098 0.272 -0.089 0.316
ma.L2 -0.4008 0.113 -3.548 0.000 -0.622 -0.179
sigma2 0.0081 0.001 6.174 0.000 0.006 0.011
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.57
Prob(Q): 0.99 Prob(JB): 0.28
Heteroskedasticity (H): 0.60 Skew: 0.36
Prob(H) (two-sided): 0.26 Kurtosis: 3.71
<details>
<summary>英文:</summary>
While conducting ARIMA analysis, the output from Stata 17 and output from statsmodels differ
When i applied
re = ARIMA(df_log, order = (0,1,2))
print(re`.fit().summary())
the results were as follows
SARIMAX Results
Dep. Variable: GDP No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 48.459
Date: Thu, 11 May 2023 AIC -90.918
Time: 02:21:08 BIC -84.585
Sample: 01-01-1960 HQIC -88.436
- 01-01-2021
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ma.L1 0.4751 0.142 3.349 0.001 0.197 0.753
ma.L2 -0.0500 0.151 -0.332 0.740 -0.345 0.245
sigma2 0.0119 0.002 6.720 0.000 0.008 0.015
Ljung-Box (L1) (Q): 4.11 Jarque-Bera (JB): 3.62
Prob(Q): 0.04 Prob(JB): 0.16
Heteroskedasticity (H): 0.60 Skew: 0.37
Prob(H) (two-sided): 0.27 Kurtosis: 3.94
However, when conducting same approach in Stata 17, the results were as follows for same data
`arima log_gdp, arima(0,1,2)`
(setting optimization to BHHH)
Iteration 0: log likelihood = 51.833406
Iteration 1: log likelihood = 58.219464
Iteration 2: log likelihood = 59.750732
Iteration 3: log likelihood = 60.128641
Iteration 4: log likelihood = 60.183567
(switching optimization to BFGS)
Iteration 5: log likelihood = 60.191613
Iteration 6: log likelihood = 60.192693
Iteration 7: log likelihood = 60.192721
Iteration 8: log likelihood = 60.192721
ARIMA regression
Sample: 1961 thru 2021 Number of obs = 61
Wald chi2(2) = 17.21
Log likelihood = 60.19272 Prob > chi2 = 0.0002
| OPG
D.log_gdp | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
log_gdp |
_cons | .0707899 .0085723 8.26 0.000 .0539885 .0875912
-------------+----------------------------------------------------------------
ARMA |
ma |
L1. | .1135653 .103465 1.10 0.272 -.0892223 .3163529
L2. | -.4008123 .1129969 -3.55 0.000 -.6222821 -.1793425
-------------+----------------------------------------------------------------
/sigma | .0899162 .007283 12.35 0.000 .0756417 .1041907
Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
The results are different. Hence seeking for explanation if I am missing something. Nonetheless, if i use the differenced data at level 1 in statsmodels but with model = ARIMA(0,0,2), the results are matching. Here I am using statsmodels verison 0.13.5
re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
print(re.fit().summary()
SARIMAX Results
==============================================================================
Dep. Variable: GDP No. Observations: 61
Model: ARIMA(0, 0, 2) Log Likelihood 60.193
Date: Thu, 11 May 2023 AIC -112.386
Time: 02:14:30 BIC -103.942
Sample: 01-01-1961 HQIC -109.076
- 01-01-2021
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
const 0.0708 0.009 8.258 0.000 0.054 0.088
ma.L1 0.1136 0.103 1.098 0.272 -0.089 0.316
ma.L2 -0.4008 0.113 -3.548 0.000 -0.622 -0.179
sigma2 0.0081 0.001 6.174 0.000 0.006 0.011
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.57
Prob(Q): 0.99 Prob(JB): 0.28
Heteroskedasticity (H): 0.60 Skew: 0.36
Prob(H) (two-sided): 0.26 Kurtosis: 3.71
</details>
# 答案1
**得分**: 1
结果的差异是因为 Statsmodels 在拥有差分模型时不会自动包含趋势。您可以看到 Stata 的结果有一个额外的参数。
如果您指定一个带有趋势的模型,那么结果会相当接近:
```python
re = ARIMA(df_log, order=(0, 1, 2), trend='t')
print(re.fit().summary())
得到的结果如下:
SARIMAX Results
==============================================================================
Dep. Variable: value No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 59.730
Date: Fri, 12 May 2023 AIC -111.461
Time: 22:58:31 BIC -103.017
Sample: 12-31-1960 HQIC -108.152
- 12-31-2021
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.0707 0.009 8.270 0.000 0.054 0.087
ma.L1 0.1026 0.106 0.969 0.333 -0.105 0.310
ma.L2 -0.3959 0.112 -3.525 0.000 -0.616 -0.176
sigma2 0.0082 0.001 6.255 0.000 0.006 0.011
==============================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.42
Prob(Q): 0.98 Prob(JB): 0.30
Heteroskedasticity (H): 0.58 Skew: 0.31
Prob(H) (two-sided): 0.22 Kurtosis: 3.76
==============================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
英文:
The difference between the results is because Statsmodels does not automatically include a trend when you have a model with differencing. You can see that Stata's results have an extra parameter.
If you specify a model with a trend, then the results match pretty closely:
re = ARIMA(df_log, order = (0,1,2), trend='t')
print(re.fit().summary())
gives:
SARIMAX Results
==============================================================================
Dep. Variable: value No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 59.730
Date: Fri, 12 May 2023 AIC -111.461
Time: 22:58:31 BIC -103.017
Sample: 12-31-1960 HQIC -108.152
- 12-31-2021
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.0707 0.009 8.270 0.000 0.054 0.087
ma.L1 0.1026 0.106 0.969 0.333 -0.105 0.310
ma.L2 -0.3959 0.112 -3.525 0.000 -0.616 -0.176
sigma2 0.0082 0.001 6.255 0.000 0.006 0.011
===================================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.42
Prob(Q): 0.98 Prob(JB): 0.30
Heteroskedasticity (H): 0.58 Skew: 0.31
Prob(H) (two-sided): 0.22 Kurtosis: 3.76
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论