Statsmodels ARIMA (0,1,2) 的结果与 Stata ARIMA(0,1,2) 不同。

huangapple go评论64阅读模式
英文:

Statsmodels ARIMA (0,1,2) result different from Stata ARIMA(0,1,2)

问题

在进行ARIMA分析时,从Stata 17和statsmodels的输出结果不同。

当我应用以下代码时:

re = ARIMA(df_log, order = (0,1,2))
print(re.fit().summary())

结果如下:
SARIMAX Results

Dep. Variable: GDP No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 48.459
Date: Thu, 11 May 2023 AIC -90.918
Time: 02:21:08 BIC -84.585
Sample: 01-01-1960 HQIC -88.436
- 01-01-2021
Covariance Type: opg

             coef    std err          z      P>|z|      [0.025      0.975]

ma.L1 0.4751 0.142 3.349 0.001 0.197 0.753
ma.L2 -0.0500 0.151 -0.332 0.740 -0.345 0.245
sigma2 0.0119 0.002 6.720 0.000 0.008 0.015

Ljung-Box (L1) (Q): 4.11 Jarque-Bera (JB): 3.62
Prob(Q): 0.04 Prob(JB): 0.16
Heteroskedasticity (H): 0.60 Skew: 0.37
Prob(H) (two-sided): 0.27 Kurtosis: 3.94

然而,在Stata 17中执行相同方法时,相同数据的结果如下:

arima log_gdp, arima(0,1,2)

(setting optimization to BHHH)
Iteration 0:   log likelihood =  51.833406  
Iteration 1:   log likelihood =  58.219464  
Iteration 2:   log likelihood =  59.750732  
Iteration 3:   log likelihood =  60.128641  
Iteration 4:   log likelihood =  60.183567  
(switching optimization to BFGS)
Iteration 5:   log likelihood =  60.191613  
Iteration 6:   log likelihood =  60.192693  
Iteration 7:   log likelihood =  60.192721  
Iteration 8:   log likelihood =  60.192721  

ARIMA regression

Sample: 1961 thru 2021                          Number of obs     =         61
                                                Wald chi2(2)      =      17.21
Log likelihood = 60.19272                       Prob > chi2       =     0.0002

------------------------------------------------------------------------------
             |                 OPG
   D.log_gdp | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
log_gdp      |
       _cons |   .0707899   .0085723     8.26   0.000     .0539885    .0875912
-------------+----------------------------------------------------------------
ARMA         |
          ma |
         L1. |   .1135653    .103465     1.10   0.272    -.0892223    .3163529
         L2. |  -.4008123   .1129969    -3.55   0.000    -.6222821   -.1793425
-------------+----------------------------------------------------------------
      /sigma |   .0899162    .007283    12.35   0.000     .0756417    .1041907
------------------------------------------------------------------------------
Note: The test of the variance against zero is one sided, and the two-sided
      confidence interval is truncated at zero.

结果不同。因此,我想知道是否漏掉了什么解释。然而,如果我在statsmodels中使用一阶差分数据,但模型为ARIMA(0,0,2),结果是匹配的。在这里我使用的是statsmodels版本0.13.5

re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
print(re.fit().summary())

结果如下:

SARIMAX Results

Dep. Variable: GDP No. Observations: 61
Model: ARIMA(0, 0, 2) Log Likelihood 60.193
Date: Thu, 11 May 2023 AIC -112.386
Time: 02:14:30 BIC -103.942
Sample: 01-01-1961 HQIC -109.076
- 01-01-2021
Covariance Type: opg

             coef    std err          z      P>|z|      [0.025      0.975]

const 0.0708 0.009 8.258 0.000 0.054 0.088
ma.L1 0.1136 0.103 1.098 0.272 -0.089 0.316
ma.L2 -0.4008 0.113 -3.548 0.000 -0.622 -0.179
sigma2 0.0081 0.001 6.174 0.000 0.006 0.011

Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.57
Prob(Q): 0.99 Prob(JB): 0.28
Heteroskedasticity (H): 0.60 Skew: 0.36
Prob(H) (two-sided): 0.26 Kurtosis: 3.71


<details>
<summary>英文:</summary>

While conducting ARIMA analysis, the output from Stata 17 and output from statsmodels differ

When i applied 

re = ARIMA(df_log, order = (0,1,2))
print(re`.fit().summary())


the results were as follows
SARIMAX Results

Dep. Variable: GDP No. Observations: 62
Model: ARIMA(0, 1, 2) Log Likelihood 48.459
Date: Thu, 11 May 2023 AIC -90.918
Time: 02:21:08 BIC -84.585
Sample: 01-01-1960 HQIC -88.436
- 01-01-2021
Covariance Type: opg

             coef    std err          z      P&gt;|z|      [0.025      0.975]

ma.L1 0.4751 0.142 3.349 0.001 0.197 0.753
ma.L2 -0.0500 0.151 -0.332 0.740 -0.345 0.245
sigma2 0.0119 0.002 6.720 0.000 0.008 0.015

Ljung-Box (L1) (Q): 4.11 Jarque-Bera (JB): 3.62
Prob(Q): 0.04 Prob(JB): 0.16
Heteroskedasticity (H): 0.60 Skew: 0.37
Prob(H) (two-sided): 0.27 Kurtosis: 3.94


However, when conducting same approach in Stata 17, the results were as follows for same data 

`arima log_gdp, arima(0,1,2)`

(setting optimization to BHHH)
Iteration 0: log likelihood = 51.833406
Iteration 1: log likelihood = 58.219464
Iteration 2: log likelihood = 59.750732
Iteration 3: log likelihood = 60.128641
Iteration 4: log likelihood = 60.183567
(switching optimization to BFGS)
Iteration 5: log likelihood = 60.191613
Iteration 6: log likelihood = 60.192693
Iteration 7: log likelihood = 60.192721
Iteration 8: log likelihood = 60.192721

ARIMA regression

Sample: 1961 thru 2021 Number of obs = 61
Wald chi2(2) = 17.21
Log likelihood = 60.19272 Prob > chi2 = 0.0002


         |                 OPG

D.log_gdp | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
log_gdp |
_cons | .0707899 .0085723 8.26 0.000 .0539885 .0875912
-------------+----------------------------------------------------------------
ARMA |
ma |
L1. | .1135653 .103465 1.10 0.272 -.0892223 .3163529
L2. | -.4008123 .1129969 -3.55 0.000 -.6222821 -.1793425
-------------+----------------------------------------------------------------
/sigma | .0899162 .007283 12.35 0.000 .0756417 .1041907

Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.


The results are different. Hence seeking for explanation if I am missing something. Nonetheless, if i use the differenced data at level 1 in statsmodels but with model = ARIMA(0,0,2), the results are matching. Here I am using statsmodels verison 0.13.5

re = ARIMA(df_log.diff().dropna(), order = (0,1,2))
print(re.fit().summary()



              SARIMAX Results                                

==============================================================================
Dep. Variable: GDP No. Observations: 61
Model: ARIMA(0, 0, 2) Log Likelihood 60.193
Date: Thu, 11 May 2023 AIC -112.386
Time: 02:14:30 BIC -103.942
Sample: 01-01-1961 HQIC -109.076
- 01-01-2021
Covariance Type: opg

             coef    std err          z      P&gt;|z|      [0.025      0.975]

const 0.0708 0.009 8.258 0.000 0.054 0.088
ma.L1 0.1136 0.103 1.098 0.272 -0.089 0.316
ma.L2 -0.4008 0.113 -3.548 0.000 -0.622 -0.179
sigma2 0.0081 0.001 6.174 0.000 0.006 0.011

Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 2.57
Prob(Q): 0.99 Prob(JB): 0.28
Heteroskedasticity (H): 0.60 Skew: 0.36
Prob(H) (two-sided): 0.26 Kurtosis: 3.71



</details>


# 答案1
**得分**: 1

结果的差异是因为 Statsmodels 在拥有差分模型时不会自动包含趋势。您可以看到 Stata 的结果有一个额外的参数。

如果您指定一个带有趋势的模型,那么结果会相当接近:

```python
re = ARIMA(df_log, order=(0, 1, 2), trend='t')
print(re.fit().summary())

得到的结果如下:

                                   SARIMAX Results                                
==============================================================================
Dep. Variable:                  value   No. Observations:                   62
Model:                 ARIMA(0, 1, 2)   Log Likelihood                  59.730
Date:                Fri, 12 May 2023   AIC                           -111.461
Time:                        22:58:31   BIC                           -103.017
Sample:                    12-31-1960   HQIC                          -108.152
                             - 12-31-2021                                         
Covariance Type:                  opg                                         
==============================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0707      0.009      8.270      0.000       0.054       0.087
ma.L1          0.1026      0.106      0.969      0.333      -0.105       0.310
ma.L2         -0.3959      0.112     -3.525      0.000      -0.616      -0.176
sigma2         0.0082      0.001      6.255      0.000       0.006       0.011
==============================================================================

Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 2.42
Prob(Q):                              0.98   Prob(JB):                         0.30
Heteroskedasticity (H):               0.58   Skew:                             0.31
Prob(H) (two-sided):                  0.22   Kurtosis:                         3.76
==============================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
英文:

The difference between the results is because Statsmodels does not automatically include a trend when you have a model with differencing. You can see that Stata's results have an extra parameter.

If you specify a model with a trend, then the results match pretty closely:

re = ARIMA(df_log, order = (0,1,2), trend=&#39;t&#39;)
print(re.fit().summary())

gives:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  value   No. Observations:                   62
Model:                 ARIMA(0, 1, 2)   Log Likelihood                  59.730
Date:                Fri, 12 May 2023   AIC                           -111.461
Time:                        22:58:31   BIC                           -103.017
Sample:                    12-31-1960   HQIC                          -108.152
                         - 12-31-2021                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P&gt;|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0707      0.009      8.270      0.000       0.054       0.087
ma.L1          0.1026      0.106      0.969      0.333      -0.105       0.310
ma.L2         -0.3959      0.112     -3.525      0.000      -0.616      -0.176
sigma2         0.0082      0.001      6.255      0.000       0.006       0.011
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 2.42
Prob(Q):                              0.98   Prob(JB):                         0.30
Heteroskedasticity (H):               0.58   Skew:                             0.31
Prob(H) (two-sided):                  0.22   Kurtosis:                         3.76
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

huangapple
  • 本文由 发表于 2023年5月11日 10:41:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223805.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定