英文:
Why are the fitting results of the Arima() and glm() function different?
问题
我对Arima()
函数和glm()
函数拟合结果之间的差异感到困惑。
我想要拟合一个带有外生变量的AR(1)模型。以下是方程式:
$$
x_{t} = \alpha_{0} + \alpha_{1}x_{t-1} + \beta_{1}z_{t} + \epsilon_{t}
$$
现在我使用Arima()
函数和glm()
函数估计这个模型并比较结果,但结果差异很大!
这是样本数据。x代表时间序列变量,z代表上面方程中的外生变量。
library(forecast)
library(tidyverse)
data("Nile")
df <-
Nile %>%
as_tibble() %>%
mutate(x = as.numeric(x)) %>%
mutate(z = rnorm(100))
然后使用Arima()
和glm()
函数拟合模型并比较结果。
fit_arima <- Arima(df$x, order = c(1, 0, 0), include.mean = TRUE, xreg = df$z)
tibble(Parameters = c("x lag", "intercept", "z"),
Coefficients = coef(fit_arima),
Standard_Errors = sqrt(diag(vcov(fit_arima))))
fit_glm <- glm(df$x ~ lag(df$x) + df$z)
tibble(Parameters = c("intercept", "x lag", "z"),
Coefficients = coef(fit_glm),
Standard_Errors = summary(fit_glm)$coefficients[, "Std. Error"])
结果如下所示。
Arima()
函数:
# A tibble: 3 × 3
Parameters Coefficients Standard_Errors
<chr> <dbl> <dbl>
1 x lag 0.510 0.0868
2 intercept 920. 29.4
3 z 5.02 12.1
glm()
函数:
# A tibble: 3 × 3
Parameters Coefficients Standard_Errors
<chr> <dbl> <dbl>
1 intercept 444. 83.4
2 x lag 0.516 0.0896
3 z 8.95 13.9
估计的x滞后系数和标准误非常接近,但其他两个变量的值非常不同。我觉得这很令人困惑,因为Arima()
和glm()
函数都使用了最大似然估计。您能否解释为什么会出现这种差异以及如何修复?
英文:
I am confused about the difference in the fitting results of the Arima()
function and glm()
function.
I want to fit an AR(1) model with an exogeneous variable. Here is the equation:
$$
x_{t} = \alpha_{0} + \alpha_{1}x_{t-1} + \beta_{1}z_{t} + \epsilon_{t}
$$
Now I estimate this model using the Arima()
function and glm()
function and compare the results, but the results turned out to be quite different!
Here is the sample data. x denotes the time-series variable, and z denotes the exogeneous variable, as shown in the equation above.
library(forecast)
library(tidyverse)
data("Nile")
df <-
Nile %>%
as_tibble() %>%
mutate(x = as.numeric(x)) %>%
mutate(z = rnorm(100))
Then fit the model using the Arima()
and glm()
and compare the results.
fit_arima <- Arima(df$x, order = c(1, 0, 0), include.mean = TRUE, xreg = df$z)
tibble(Parameters = c("x lag", "intercept", "z"),
Coefficients = coef(fit_arima),
Standard_Errors = sqrt(diag(vcov(fit_arima))))
fit_glm <- glm(df$x ~ lag(df$x) + df$z)
tibble(Parameters = c("intercept", "x lag", "z"),
Coefficients = coef(fit_glm),
Standard_Errors = summary(fit_glm)$coefficients[, "Std. Error"])
The results are displayed as follows.
Arima()
function:
# A tibble: 3 × 3
Parameters Coefficients Standard_Errors
<chr> <dbl> <dbl>
1 x lag 0.510 0.0868
2 intercept 920. 29.4
3 z 5.02 12.1
glm()
function:
# A tibble: 3 × 3
Parameters Coefficients Standard_Errors
<chr> <dbl> <dbl>
1 intercept 444. 83.4
2 x lag 0.516 0.0896
3 z 8.95 13.9
The estimated coefficient and standard error of x lag are quite close, but the values of other two variables are very different. I find this puzzling because both the Arima()
and glm()
function use the maximum likelihood estimator. Could you please explain why this difference happens and how can I fix this?
答案1
得分: 2
首先,Arima()
不适用于你的方程模型。它适用于具有ARIMA误差的回归模型,如下所示:
x_{t} = \alpha_{0} + \beta_{1}z_{t} + \eta_{t}
其中
\eta_t = \phi_{1}\eta_{t-1}+\varepsilon_{t}.
我们可以重排这个方程,得到:
x_{t} = (1-\phi_{1})\alpha_{0} + \phi_{1}x_{t-1} + \beta_{1}z_{t} - \beta_{1}\phi_{1}z_{t-1} + \varepsilon_{t}
这解释了两个结果之间的主要差异。
但即使你指定完全相同的模型,它们也会略有不同的结果,因为 Arima()
使用真实似然函数,而 glm()
会使用条件似然函数,这是由于 lag()
函数导致的初始缺失值。
请参考 https://robjhyndman.com/hyndsight/arimax/ 了解不同模型规范的讨论。
英文:
First, Arima()
does not fit the model given in your equation. It fits a regression with ARIMA errors like this:
x_{t} = \alpha_{0} + \beta_{1}z_{t} + \eta_{t}
where
\eta_t = \phi_{1}\eta_{t-1}+\varepsilon_{t}.
We can rearrange this to give
x_{t} = (1-\phi_{1})\alpha_{0} + \phi_{1}x_{t-1} + \beta_{1}z_{t} - \beta_{1}\phi_{1}z_{t-1} + \varepsilon_{t}
This explains the major differences in the two results.
But even if you specified exactly the same model, they would give slightly different results because Arima()
uses the true likelihood whereas glm()
will use a conditional likelihood because of the initial missing value due to the lag()
function.
See https://robjhyndman.com/hyndsight/arimax/ for a discussion of the different model specifications.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论