英文:
Statsmodels ARIMA (0,1,2) result different from Stata ARIMA(0,1,2)
问题
在进行ARIMA分析时,从Stata 17和statsmodels的输出结果不同。
当我应用以下代码时:
re = ARIMA(df_log, order = (0,1,2))
print(re.fit().summary())
结果如下:
SARIMAX Results
Dep. Variable:                    GDP   No. Observations:                   62
Model:                 ARIMA(0, 1, 2)   Log Likelihood                  48.459
Date:                Thu, 11 May 2023   AIC                            -90.918
Time:                        02:21:08   BIC                            -84.585
Sample:                    01-01-1960   HQIC                           -88.436
- 01-01-2021                                      
Covariance Type:                  opg
             coef    std err          z      P>|z|      [0.025      0.975]
ma.L1          0.4751      0.142      3.349      0.001       0.197       0.753
ma.L2         -0.0500      0.151     -0.332      0.740      -0.345       0.245
sigma2         0.0119      0.002      6.720      0.000       0.008       0.015
Ljung-Box (L1) (Q):                   4.11   Jarque-Bera (JB):                 3.62
Prob(Q):                              0.04   Prob(JB):                         0.16
Heteroskedasticity (H):               0.60   Skew:                             0.37
Prob(H) (two-sided):                  0.27   Kurtosis:                         3.94
然而,在Stata 17中执行相同方法时,相同数据的结果如下:
arima log_gdp, arima(0,1,2)
(setting optimization to BHHH)
Iteration 0:   log likelihood =  51.833406  
Iteration 1:   log likelihood =  58.219464  
Iteration 2:   log likelihood =  59.750732  
Iteration 3:   log likelihood =  60.128641  
Iteration 4:   log likelihood =  60.183567  
(switching optimization to BFGS)
Iteration 5:   log likelihood =  60.191613  
Iteration 6:   log likelihood =  60.192693  
Iteration 7:   log likelihood =  60.192721  
Iteration 8:   log likelihood =  60.192721  
ARIMA regression
Sample: 1961 thru 2021                          Number of obs     =         61
                                                Wald chi2(2)      =      17.21
Log likelihood = 60.19272                       Prob > chi2       =     0.0002
------------------------------------------------------------------------------
             |                 OPG
   D.log_gdp | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
log_gdp      |
       _cons |   .0707899   .0085723     8.26   0.000     .0539885    .0875912
-------------+----------------------------------------------------------------
ARMA         |
          ma |
         L1. |   .1135653    .103465     1.10   0.272    -.0892223    .3163529
         L2. |  -.4008123   .1129969    -3.55   0.000    -.6222821   -.1793425
-------------+----------------------------------------------------------------
      /sigma |   .0899162    .007283    12.35   0.000     .0756417    .1041907
------------------------------------------------------------------------------
Note: The test of the variance against zero is one sided, and the two-sided
      confidence interval is truncated at zero.
结果不同。因此,我想知道是否漏掉了什么解释。然而,如果我在statsmodels中使用一阶差分数据,但模型为ARIMA(0,0,2),结果是匹配的。在这里我使用的是statsmodels版本0.13.5
re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
print(re.fit().summary())
结果如下:
SARIMAX Results
Dep. Variable:                    GDP   No. Observations:                   61
Model:                 ARIMA(0, 0, 2)   Log Likelihood                  60.193
Date:                Thu, 11 May 2023   AIC                           -112.386
Time:                        02:14:30   BIC                           -103.942
Sample:                    01-01-1961   HQIC                          -109.076
- 01-01-2021                                      
Covariance Type:                  opg
             coef    std err          z      P>|z|      [0.025      0.975]
const          0.0708      0.009      8.258      0.000       0.054       0.088
ma.L1          0.1136      0.103      1.098      0.272      -0.089       0.316
ma.L2         -0.4008      0.113     -3.548      0.000      -0.622      -0.179
sigma2         0.0081      0.001      6.174      0.000       0.006       0.011
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 2.57
Prob(Q):                              0.99   Prob(JB):                         0.28
Heteroskedasticity (H):               0.60   Skew:                             0.36
Prob(H) (two-sided):                  0.26   Kurtosis:                         3.71
<details>
<summary>英文:</summary>
While conducting ARIMA analysis, the output from Stata 17 and output from statsmodels differ
When i applied 
re = ARIMA(df_log, order = (0,1,2))
print(re`.fit().summary())
the results were as follows
SARIMAX Results
Dep. Variable:                    GDP   No. Observations:                   62
Model:                 ARIMA(0, 1, 2)   Log Likelihood                  48.459
Date:                Thu, 11 May 2023   AIC                            -90.918
Time:                        02:21:08   BIC                            -84.585
Sample:                    01-01-1960   HQIC                           -88.436
- 01-01-2021                                      
Covariance Type:                  opg
             coef    std err          z      P>|z|      [0.025      0.975]
ma.L1          0.4751      0.142      3.349      0.001       0.197       0.753
ma.L2         -0.0500      0.151     -0.332      0.740      -0.345       0.245
sigma2         0.0119      0.002      6.720      0.000       0.008       0.015
Ljung-Box (L1) (Q):                   4.11   Jarque-Bera (JB):                 3.62
Prob(Q):                              0.04   Prob(JB):                         0.16
Heteroskedasticity (H):               0.60   Skew:                             0.37
Prob(H) (two-sided):                  0.27   Kurtosis:                         3.94
However, when conducting same approach in Stata 17, the results were as follows for same data 
`arima log_gdp, arima(0,1,2)`
(setting optimization to BHHH)
Iteration 0:   log likelihood =  51.833406
Iteration 1:   log likelihood =  58.219464
Iteration 2:   log likelihood =  59.750732
Iteration 3:   log likelihood =  60.128641
Iteration 4:   log likelihood =  60.183567
(switching optimization to BFGS)
Iteration 5:   log likelihood =  60.191613
Iteration 6:   log likelihood =  60.192693
Iteration 7:   log likelihood =  60.192721
Iteration 8:   log likelihood =  60.192721
ARIMA regression
Sample: 1961 thru 2021                          Number of obs     =         61
Wald chi2(2)      =      17.21
Log likelihood = 60.19272                       Prob > chi2       =     0.0002
         |                 OPG
D.log_gdp | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
log_gdp      |
_cons |   .0707899   .0085723     8.26   0.000     .0539885    .0875912
-------------+----------------------------------------------------------------
ARMA         |
ma |
L1. |   .1135653    .103465     1.10   0.272    -.0892223    .3163529
L2. |  -.4008123   .1129969    -3.55   0.000    -.6222821   -.1793425
-------------+----------------------------------------------------------------
/sigma |   .0899162    .007283    12.35   0.000     .0756417    .1041907
Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
The results are different. Hence seeking for explanation if I am missing something. Nonetheless, if i use the differenced data at level 1 in statsmodels but with model = ARIMA(0,0,2), the results are matching. Here I am using statsmodels verison 0.13.5
re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
print(re.fit().summary()
              SARIMAX Results                                
==============================================================================
Dep. Variable:                    GDP   No. Observations:                   61
Model:                 ARIMA(0, 0, 2)   Log Likelihood                  60.193
Date:                Thu, 11 May 2023   AIC                           -112.386
Time:                        02:14:30   BIC                           -103.942
Sample:                    01-01-1961   HQIC                          -109.076
- 01-01-2021                                      
Covariance Type:                  opg
             coef    std err          z      P>|z|      [0.025      0.975]
const          0.0708      0.009      8.258      0.000       0.054       0.088
ma.L1          0.1136      0.103      1.098      0.272      -0.089       0.316
ma.L2         -0.4008      0.113     -3.548      0.000      -0.622      -0.179
sigma2         0.0081      0.001      6.174      0.000       0.006       0.011
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 2.57
Prob(Q):                              0.99   Prob(JB):                         0.28
Heteroskedasticity (H):               0.60   Skew:                             0.36
Prob(H) (two-sided):                  0.26   Kurtosis:                         3.71
</details>
# 答案1
**得分**: 1
结果的差异是因为 Statsmodels 在拥有差分模型时不会自动包含趋势。您可以看到 Stata 的结果有一个额外的参数。
如果您指定一个带有趋势的模型,那么结果会相当接近:
```python
re = ARIMA(df_log, order=(0, 1, 2), trend='t')
print(re.fit().summary())
得到的结果如下:
                                   SARIMAX Results                                
==============================================================================
Dep. Variable:                  value   No. Observations:                   62
Model:                 ARIMA(0, 1, 2)   Log Likelihood                  59.730
Date:                Fri, 12 May 2023   AIC                           -111.461
Time:                        22:58:31   BIC                           -103.017
Sample:                    12-31-1960   HQIC                          -108.152
                             - 12-31-2021                                         
Covariance Type:                  opg                                         
==============================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0707      0.009      8.270      0.000       0.054       0.087
ma.L1          0.1026      0.106      0.969      0.333      -0.105       0.310
ma.L2         -0.3959      0.112     -3.525      0.000      -0.616      -0.176
sigma2         0.0082      0.001      6.255      0.000       0.006       0.011
==============================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 2.42
Prob(Q):                              0.98   Prob(JB):                         0.30
Heteroskedasticity (H):               0.58   Skew:                             0.31
Prob(H) (two-sided):                  0.22   Kurtosis:                         3.76
==============================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
英文:
The difference between the results is because Statsmodels does not automatically include a trend when you have a model with differencing. You can see that Stata's results have an extra parameter.
If you specify a model with a trend, then the results match pretty closely:
re = ARIMA(df_log, order = (0,1,2), trend='t')
print(re.fit().summary())
gives:
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  value   No. Observations:                   62
Model:                 ARIMA(0, 1, 2)   Log Likelihood                  59.730
Date:                Fri, 12 May 2023   AIC                           -111.461
Time:                        22:58:31   BIC                           -103.017
Sample:                    12-31-1960   HQIC                          -108.152
                         - 12-31-2021                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0707      0.009      8.270      0.000       0.054       0.087
ma.L1          0.1026      0.106      0.969      0.333      -0.105       0.310
ma.L2         -0.3959      0.112     -3.525      0.000      -0.616      -0.176
sigma2         0.0082      0.001      6.255      0.000       0.006       0.011
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 2.42
Prob(Q):                              0.98   Prob(JB):                         0.30
Heteroskedasticity (H):               0.58   Skew:                             0.31
Prob(H) (two-sided):                  0.22   Kurtosis:                         3.76
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论