为什么Arima()和glm()函数的拟合结果不同?

huangapple go评论65阅读模式
英文:

Why are the fitting results of the Arima() and glm() function different?

问题

我对Arima()函数和glm()函数拟合结果之间的差异感到困惑。

我想要拟合一个带有外生变量的AR(1)模型。以下是方程式:

$$
x_{t} = \alpha_{0} + \alpha_{1}x_{t-1} + \beta_{1}z_{t} + \epsilon_{t}
$$

现在我使用Arima()函数和glm()函数估计这个模型并比较结果,但结果差异很大!

这是样本数据。x代表时间序列变量,z代表上面方程中的外生变量。

library(forecast)
library(tidyverse)
data("Nile")
df <- 
  Nile %>% 
  as_tibble() %>% 
  mutate(x = as.numeric(x)) %>% 
  mutate(z = rnorm(100))

然后使用Arima()glm()函数拟合模型并比较结果。

fit_arima <- Arima(df$x, order = c(1, 0, 0), include.mean = TRUE, xreg = df$z)
tibble(Parameters = c("x lag", "intercept", "z"),
       Coefficients = coef(fit_arima),
       Standard_Errors = sqrt(diag(vcov(fit_arima))))  
fit_glm <- glm(df$x ~ lag(df$x) + df$z) 
tibble(Parameters = c("intercept", "x lag", "z"),
       Coefficients = coef(fit_glm),
       Standard_Errors = summary(fit_glm)$coefficients[, "Std. Error"])

结果如下所示。

Arima()函数:

# A tibble: 3 × 3
  Parameters Coefficients Standard_Errors
  <chr>             <dbl>           <dbl>
1 x lag             0.510          0.0868
2 intercept       920.            29.4   
3 z                 5.02          12.1    

glm()函数:

# A tibble: 3 × 3
  Parameters Coefficients Standard_Errors
  <chr>             <dbl>           <dbl>
1 intercept       444.            83.4   
2 x lag             0.516          0.0896
3 z                 8.95          13.9 

估计的x滞后系数和标准误非常接近,但其他两个变量的值非常不同。我觉得这很令人困惑,因为Arima()glm()函数都使用了最大似然估计。您能否解释为什么会出现这种差异以及如何修复?

英文:

I am confused about the difference in the fitting results of the Arima() function and glm() function.

I want to fit an AR(1) model with an exogeneous variable. Here is the equation:

$$
x_{t} = \alpha_{0} + \alpha_{1}x_{t-1} + \beta_{1}z_{t} + \epsilon_{t}
$$

Now I estimate this model using the Arima() function and glm() function and compare the results, but the results turned out to be quite different!

Here is the sample data. x denotes the time-series variable, and z denotes the exogeneous variable, as shown in the equation above.

library(forecast)
library(tidyverse)
data(&quot;Nile&quot;)
df &lt;- 
  Nile %&gt;% 
  as_tibble() %&gt;% 
  mutate(x = as.numeric(x)) %&gt;% 
  mutate(z = rnorm(100))

Then fit the model using the Arima() and glm() and compare the results.

fit_arima &lt;- Arima(df$x, order = c(1, 0, 0), include.mean = TRUE, xreg = df$z)
tibble(Parameters = c(&quot;x lag&quot;, &quot;intercept&quot;, &quot;z&quot;),
       Coefficients = coef(fit_arima),
       Standard_Errors = sqrt(diag(vcov(fit_arima))))  
fit_glm &lt;- glm(df$x ~ lag(df$x) + df$z) 
tibble(Parameters = c(&quot;intercept&quot;, &quot;x lag&quot;, &quot;z&quot;),
       Coefficients = coef(fit_glm),
       Standard_Errors = summary(fit_glm)$coefficients[, &quot;Std. Error&quot;])

The results are displayed as follows.

Arima() function:

# A tibble: 3 &#215; 3
  Parameters Coefficients Standard_Errors
  &lt;chr&gt;             &lt;dbl&gt;           &lt;dbl&gt;
1 x lag             0.510          0.0868
2 intercept       920.            29.4   
3 z                 5.02          12.1    

glm() function:

# A tibble: 3 &#215; 3
  Parameters Coefficients Standard_Errors
  &lt;chr&gt;             &lt;dbl&gt;           &lt;dbl&gt;
1 intercept       444.            83.4   
2 x lag             0.516          0.0896
3 z                 8.95          13.9 

The estimated coefficient and standard error of x lag are quite close, but the values of other two variables are very different. I find this puzzling because both the Arima() and glm() function use the maximum likelihood estimator. Could you please explain why this difference happens and how can I fix this?

答案1

得分: 2

首先,Arima() 不适用于你的方程模型。它适用于具有ARIMA误差的回归模型,如下所示:

x_{t} = \alpha_{0} + \beta_{1}z_{t} + \eta_{t}

其中

\eta_t = \phi_{1}\eta_{t-1}+\varepsilon_{t}.

我们可以重排这个方程,得到:

x_{t} = (1-\phi_{1})\alpha_{0} + \phi_{1}x_{t-1} + \beta_{1}z_{t} - \beta_{1}\phi_{1}z_{t-1} + \varepsilon_{t}

这解释了两个结果之间的主要差异。

但即使你指定完全相同的模型,它们也会略有不同的结果,因为 Arima() 使用真实似然函数,而 glm() 会使用条件似然函数,这是由于 lag() 函数导致的初始缺失值。

请参考 https://robjhyndman.com/hyndsight/arimax/ 了解不同模型规范的讨论。

英文:

First, Arima() does not fit the model given in your equation. It fits a regression with ARIMA errors like this:

x_{t} = \alpha_{0} + \beta_{1}z_{t} + \eta_{t}

where

\eta_t = \phi_{1}\eta_{t-1}+\varepsilon_{t}.

We can rearrange this to give

x_{t} = (1-\phi_{1})\alpha_{0} + \phi_{1}x_{t-1} + \beta_{1}z_{t} - \beta_{1}\phi_{1}z_{t-1} + \varepsilon_{t}

This explains the major differences in the two results.

But even if you specified exactly the same model, they would give slightly different results because Arima() uses the true likelihood whereas glm() will use a conditional likelihood because of the initial missing value due to the lag() function.

See https://robjhyndman.com/hyndsight/arimax/ for a discussion of the different model specifications.

huangapple
  • 本文由 发表于 2023年6月2日 12:23:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387096.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定