2023年6月2日 12:23:29go评论93阅读模式

英文:

Why are the fitting results of the Arima() and glm() function different?

问题

我对Arima()函数和glm()函数拟合结果之间的差异感到困惑。

我想要拟合一个带有外生变量的AR(1)模型。以下是方程式：

$$
x_{t} = \alpha_{0} + \alpha_{1}x_{t-1} + \beta_{1}z_{t} + \epsilon_{t}
$$

现在我使用Arima()函数和glm()函数估计这个模型并比较结果，但结果差异很大！

这是样本数据。x代表时间序列变量，z代表上面方程中的外生变量。

library(forecast)
library(tidyverse)
data("Nile")
df <- 
  Nile %>% 
  as_tibble() %>% 
  mutate(x = as.numeric(x)) %>% 
  mutate(z = rnorm(100))

然后使用Arima()和glm()函数拟合模型并比较结果。

fit_arima <- Arima(df$x, order = c(1, 0, 0), include.mean = TRUE, xreg = df$z)
tibble(Parameters = c("x lag", "intercept", "z"),
       Coefficients = coef(fit_arima),
       Standard_Errors = sqrt(diag(vcov(fit_arima))))  
fit_glm <- glm(df$x ~ lag(df$x) + df$z) 
tibble(Parameters = c("intercept", "x lag", "z"),
       Coefficients = coef(fit_glm),
       Standard_Errors = summary(fit_glm)$coefficients[, "Std. Error"])

结果如下所示。

Arima()函数:

# A tibble: 3 × 3
  Parameters Coefficients Standard_Errors
  <chr>             <dbl>           <dbl>
1 x lag             0.510          0.0868
2 intercept       920.            29.4   
3 z                 5.02          12.1

glm()函数:

# A tibble: 3 × 3
  Parameters Coefficients Standard_Errors
  <chr>             <dbl>           <dbl>
1 intercept       444.            83.4   
2 x lag             0.516          0.0896
3 z                 8.95          13.9

估计的x滞后系数和标准误非常接近，但其他两个变量的值非常不同。我觉得这很令人困惑，因为Arima()和glm()函数都使用了最大似然估计。您能否解释为什么会出现这种差异以及如何修复？

英文:

I am confused about the difference in the fitting results of the Arima() function and glm() function.

I want to fit an AR(1) model with an exogeneous variable. Here is the equation:

$$
x_{t} = \alpha_{0} + \alpha_{1}x_{t-1} + \beta_{1}z_{t} + \epsilon_{t}
$$

Now I estimate this model using the Arima() function and glm() function and compare the results, but the results turned out to be quite different!

Here is the sample data. x denotes the time-series variable, and z denotes the exogeneous variable, as shown in the equation above.

library(forecast)
library(tidyverse)
data(&quot;Nile&quot;)
df &lt;- 
  Nile %&gt;% 
  as_tibble() %&gt;% 
  mutate(x = as.numeric(x)) %&gt;% 
  mutate(z = rnorm(100))

Then fit the model using the Arima() and glm() and compare the results.

fit_arima &lt;- Arima(df$x, order = c(1, 0, 0), include.mean = TRUE, xreg = df$z)
tibble(Parameters = c(&quot;x lag&quot;, &quot;intercept&quot;, &quot;z&quot;),
       Coefficients = coef(fit_arima),
       Standard_Errors = sqrt(diag(vcov(fit_arima))))  
fit_glm &lt;- glm(df$x ~ lag(df$x) + df$z) 
tibble(Parameters = c(&quot;intercept&quot;, &quot;x lag&quot;, &quot;z&quot;),
       Coefficients = coef(fit_glm),
       Standard_Errors = summary(fit_glm)$coefficients[, &quot;Std. Error&quot;])

The results are displayed as follows.

Arima() function:

# A tibble: 3 &#215; 3
  Parameters Coefficients Standard_Errors
  &lt;chr&gt;             &lt;dbl&gt;           &lt;dbl&gt;
1 x lag             0.510          0.0868
2 intercept       920.            29.4   
3 z                 5.02          12.1

glm() function:

# A tibble: 3 &#215; 3
  Parameters Coefficients Standard_Errors
  &lt;chr&gt;             &lt;dbl&gt;           &lt;dbl&gt;
1 intercept       444.            83.4   
2 x lag             0.516          0.0896
3 z                 8.95          13.9

The estimated coefficient and standard error of x lag are quite close, but the values of other two variables are very different. I find this puzzling because both the Arima() and glm() function use the maximum likelihood estimator. Could you please explain why this difference happens and how can I fix this?

答案1

得分: 2

首先，Arima() 不适用于你的方程模型。它适用于具有ARIMA误差的回归模型，如下所示：

x_{t} = \alpha_{0} + \beta_{1}z_{t} + \eta_{t}

其中

\eta_t = \phi_{1}\eta_{t-1}+\varepsilon_{t}.

我们可以重排这个方程，得到：

x_{t} = (1-\phi_{1})\alpha_{0} + \phi_{1}x_{t-1} + \beta_{1}z_{t} - \beta_{1}\phi_{1}z_{t-1} + \varepsilon_{t}

这解释了两个结果之间的主要差异。

但即使你指定完全相同的模型，它们也会略有不同的结果，因为 Arima() 使用真实似然函数，而 glm() 会使用条件似然函数，这是由于 lag() 函数导致的初始缺失值。

请参考 https://robjhyndman.com/hyndsight/arimax/ 了解不同模型规范的讨论。

英文:

First, Arima() does not fit the model given in your equation. It fits a regression with ARIMA errors like this:

x_{t} = \alpha_{0} + \beta_{1}z_{t} + \eta_{t}

where

\eta_t = \phi_{1}\eta_{t-1}+\varepsilon_{t}.

We can rearrange this to give

x_{t} = (1-\phi_{1})\alpha_{0} + \phi_{1}x_{t-1} + \beta_{1}z_{t} - \beta_{1}\phi_{1}z_{t-1} + \varepsilon_{t}

This explains the major differences in the two results.

But even if you specified exactly the same model, they would give slightly different results because Arima() uses the true likelihood whereas glm() will use a conditional likelihood because of the initial missing value due to the lag() function.

See https://robjhyndman.com/hyndsight/arimax/ for a discussion of the different model specifications.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么Arima()和glm()函数的拟合结果不同？

问题

答案1

网页抓取循环

计算具有滞后的条件累积和

将R中的数据框从宽格式转换为长格式，使用多组变量。

“Partial modification of layout in igraph” 的中文翻译是 “igraph 中布局的部分修改”。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。