Statsmodels ARIMA (0,1,2) 的结果与 Stata ARIMA(0,1,2) 不同。

huangapple go评论91阅读模式
英文:

Statsmodels ARIMA (0,1,2) result different from Stata ARIMA(0,1,2)

问题

在进行ARIMA分析时,从Stata 17和statsmodels的输出结果不同。

当我应用以下代码时:

  1. re = ARIMA(df_log, order = (0,1,2))
  2. print(re.fit().summary())

结果如下:
SARIMAX Results

Dep. Variable: GDP No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 48.459
Date: Thu, 11 May 2023 AIC -90.918
Time: 02:21:08 BIC -84.585
Sample: 01-01-1960 HQIC -88.436
- 01-01-2021
Covariance Type: opg

  1. coef std err z P>|z| [0.025 0.975]

ma.L1 0.4751 0.142 3.349 0.001 0.197 0.753
ma.L2 -0.0500 0.151 -0.332 0.740 -0.345 0.245
sigma2 0.0119 0.002 6.720 0.000 0.008 0.015

Ljung-Box (L1) (Q): 4.11 Jarque-Bera (JB): 3.62
Prob(Q): 0.04 Prob(JB): 0.16
Heteroskedasticity (H): 0.60 Skew: 0.37
Prob(H) (two-sided): 0.27 Kurtosis: 3.94

然而,在Stata 17中执行相同方法时,相同数据的结果如下:

  1. arima log_gdp, arima(0,1,2)
  2. (setting optimization to BHHH)
  3. Iteration 0: log likelihood = 51.833406
  4. Iteration 1: log likelihood = 58.219464
  5. Iteration 2: log likelihood = 59.750732
  6. Iteration 3: log likelihood = 60.128641
  7. Iteration 4: log likelihood = 60.183567
  8. (switching optimization to BFGS)
  9. Iteration 5: log likelihood = 60.191613
  10. Iteration 6: log likelihood = 60.192693
  11. Iteration 7: log likelihood = 60.192721
  12. Iteration 8: log likelihood = 60.192721
  13. ARIMA regression
  14. Sample: 1961 thru 2021 Number of obs = 61
  15. Wald chi2(2) = 17.21
  16. Log likelihood = 60.19272 Prob > chi2 = 0.0002
  17. ------------------------------------------------------------------------------
  18. | OPG
  19. D.log_gdp | Coefficient std. err. z P>|z| [95% conf. interval]
  20. -------------+----------------------------------------------------------------
  21. log_gdp |
  22. _cons | .0707899 .0085723 8.26 0.000 .0539885 .0875912
  23. -------------+----------------------------------------------------------------
  24. ARMA |
  25. ma |
  26. L1. | .1135653 .103465 1.10 0.272 -.0892223 .3163529
  27. L2. | -.4008123 .1129969 -3.55 0.000 -.6222821 -.1793425
  28. -------------+----------------------------------------------------------------
  29. /sigma | .0899162 .007283 12.35 0.000 .0756417 .1041907
  30. ------------------------------------------------------------------------------
  31. Note: The test of the variance against zero is one sided, and the two-sided
  32. confidence interval is truncated at zero.

结果不同。因此,我想知道是否漏掉了什么解释。然而,如果我在statsmodels中使用一阶差分数据,但模型为ARIMA(0,0,2),结果是匹配的。在这里我使用的是statsmodels版本0.13.5

  1. re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
  2. print(re.fit().summary())

结果如下:

SARIMAX Results

Dep. Variable: GDP No. Observations: 61
Model: ARIMA(0, 0, 2) Log Likelihood 60.193
Date: Thu, 11 May 2023 AIC -112.386
Time: 02:14:30 BIC -103.942
Sample: 01-01-1961 HQIC -109.076
- 01-01-2021
Covariance Type: opg

  1. coef std err z P>|z| [0.025 0.975]

const 0.0708 0.009 8.258 0.000 0.054 0.088
ma.L1 0.1136 0.103 1.098 0.272 -0.089 0.316
ma.L2 -0.4008 0.113 -3.548 0.000 -0.622 -0.179
sigma2 0.0081 0.001 6.174 0.000 0.006 0.011

Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.57
Prob(Q): 0.99 Prob(JB): 0.28
Heteroskedasticity (H): 0.60 Skew: 0.36
Prob(H) (two-sided): 0.26 Kurtosis: 3.71

  1. <details>
  2. <summary>英文:</summary>
  3. While conducting ARIMA analysis, the output from Stata 17 and output from statsmodels differ
  4. When i applied
  1. re = ARIMA(df_log, order = (0,1,2))
  2. print(re`.fit().summary())

the results were as follows
SARIMAX Results

Dep. Variable: GDP No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 48.459
Date: Thu, 11 May 2023 AIC -90.918
Time: 02:21:08 BIC -84.585
Sample: 01-01-1960 HQIC -88.436
- 01-01-2021
Covariance Type: opg

  1. coef std err z P&gt;|z| [0.025 0.975]

ma.L1 0.4751 0.142 3.349 0.001 0.197 0.753
ma.L2 -0.0500 0.151 -0.332 0.740 -0.345 0.245
sigma2 0.0119 0.002 6.720 0.000 0.008 0.015

Ljung-Box (L1) (Q): 4.11 Jarque-Bera (JB): 3.62
Prob(Q): 0.04 Prob(JB): 0.16
Heteroskedasticity (H): 0.60 Skew: 0.37
Prob(H) (two-sided): 0.27 Kurtosis: 3.94

  1. However, when conducting same approach in Stata 17, the results were as follows for same data
  2. `arima log_gdp, arima(0,1,2)`

(setting optimization to BHHH)
Iteration 0: log likelihood = 51.833406
Iteration 1: log likelihood = 58.219464
Iteration 2: log likelihood = 59.750732
Iteration 3: log likelihood = 60.128641
Iteration 4: log likelihood = 60.183567
(switching optimization to BFGS)
Iteration 5: log likelihood = 60.191613
Iteration 6: log likelihood = 60.192693
Iteration 7: log likelihood = 60.192721
Iteration 8: log likelihood = 60.192721

ARIMA regression

Sample: 1961 thru 2021 Number of obs = 61
Wald chi2(2) = 17.21
Log likelihood = 60.19272 Prob > chi2 = 0.0002


  1. | OPG

D.log_gdp | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
log_gdp |
_cons | .0707899 .0085723 8.26 0.000 .0539885 .0875912
-------------+----------------------------------------------------------------
ARMA |
ma |
L1. | .1135653 .103465 1.10 0.272 -.0892223 .3163529
L2. | -.4008123 .1129969 -3.55 0.000 -.6222821 -.1793425
-------------+----------------------------------------------------------------
/sigma | .0899162 .007283 12.35 0.000 .0756417 .1041907

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.

  1. The results are different. Hence seeking for explanation if I am missing something. Nonetheless, if i use the differenced data at level 1 in statsmodels but with model = ARIMA(0,0,2), the results are matching. Here I am using statsmodels verison 0.13.5

re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
print(re.fit().summary()

  1. SARIMAX Results

==============================================================================
Dep. Variable: GDP No. Observations: 61
Model: ARIMA(0, 0, 2) Log Likelihood 60.193
Date: Thu, 11 May 2023 AIC -112.386
Time: 02:14:30 BIC -103.942
Sample: 01-01-1961 HQIC -109.076
- 01-01-2021
Covariance Type: opg

  1. coef std err z P&gt;|z| [0.025 0.975]

const 0.0708 0.009 8.258 0.000 0.054 0.088
ma.L1 0.1136 0.103 1.098 0.272 -0.089 0.316
ma.L2 -0.4008 0.113 -3.548 0.000 -0.622 -0.179
sigma2 0.0081 0.001 6.174 0.000 0.006 0.011

Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.57
Prob(Q): 0.99 Prob(JB): 0.28
Heteroskedasticity (H): 0.60 Skew: 0.36
Prob(H) (two-sided): 0.26 Kurtosis: 3.71

  1. </details>
  2. # 答案1
  3. **得分**: 1
  4. 结果的差异是因为 Statsmodels 在拥有差分模型时不会自动包含趋势。您可以看到 Stata 的结果有一个额外的参数。
  5. 如果您指定一个带有趋势的模型,那么结果会相当接近:
  6. ```python
  7. re = ARIMA(df_log, order=(0, 1, 2), trend='t')
  8. print(re.fit().summary())

得到的结果如下:

  1. SARIMAX Results
  2. ==============================================================================
  3. Dep. Variable: value No. Observations: 62
  4. Model: ARIMA(0, 1, 2) Log Likelihood 59.730
  5. Date: Fri, 12 May 2023 AIC -111.461
  6. Time: 22:58:31 BIC -103.017
  7. Sample: 12-31-1960 HQIC -108.152
  8. - 12-31-2021
  9. Covariance Type: opg
  10. ==============================================================================
  11. coef std err z P>|z| [0.025 0.975]
  12. ------------------------------------------------------------------------------
  13. x1 0.0707 0.009 8.270 0.000 0.054 0.087
  14. ma.L1 0.1026 0.106 0.969 0.333 -0.105 0.310
  15. ma.L2 -0.3959 0.112 -3.525 0.000 -0.616 -0.176
  16. sigma2 0.0082 0.001 6.255 0.000 0.006 0.011
  17. ==============================================================================
  18. Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.42
  19. Prob(Q): 0.98 Prob(JB): 0.30
  20. Heteroskedasticity (H): 0.58 Skew: 0.31
  21. Prob(H) (two-sided): 0.22 Kurtosis: 3.76
  22. ==============================================================================
  23. Warnings:
  24. [1] Covariance matrix calculated using the outer product of gradients (complex-step).
英文:

The difference between the results is because Statsmodels does not automatically include a trend when you have a model with differencing. You can see that Stata's results have an extra parameter.

If you specify a model with a trend, then the results match pretty closely:

  1. re = ARIMA(df_log, order = (0,1,2), trend=&#39;t&#39;)
  2. print(re.fit().summary())

gives:

  1. SARIMAX Results
  2. ==============================================================================
  3. Dep. Variable: value No. Observations: 62
  4. Model: ARIMA(0, 1, 2) Log Likelihood 59.730
  5. Date: Fri, 12 May 2023 AIC -111.461
  6. Time: 22:58:31 BIC -103.017
  7. Sample: 12-31-1960 HQIC -108.152
  8. - 12-31-2021
  9. Covariance Type: opg
  10. ==============================================================================
  11. coef std err z P&gt;|z| [0.025 0.975]
  12. ------------------------------------------------------------------------------
  13. x1 0.0707 0.009 8.270 0.000 0.054 0.087
  14. ma.L1 0.1026 0.106 0.969 0.333 -0.105 0.310
  15. ma.L2 -0.3959 0.112 -3.525 0.000 -0.616 -0.176
  16. sigma2 0.0082 0.001 6.255 0.000 0.006 0.011
  17. ===================================================================================
  18. Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.42
  19. Prob(Q): 0.98 Prob(JB): 0.30
  20. Heteroskedasticity (H): 0.58 Skew: 0.31
  21. Prob(H) (two-sided): 0.22 Kurtosis: 3.76
  22. ===================================================================================
  23. Warnings:
  24. [1] Covariance matrix calculated using the outer product of gradients (complex-step).

huangapple
  • 本文由 发表于 2023年5月11日 10:41:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223805.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定