英文:
When does initialisation change the results in the (R) KFAS package?
问题
I'm trying to get an idea about the importance of the initialisation for the filtering and smoothing results of KFAS. The code below first replicates the random walk with drift example in the KFAS article with the "exact diffuse initialisation". Then I do the same with the more naive explicit initialisation P1 <- matrix(c(1e7, 0, 0, 1e7), 2, 2)
. The results coincide almost perfectly (as I would expect in a simple example like this). Everything fine so far.
现在我更改了数据集。("tdx" 是我想使用状态空间模型分析的一系列税率。)使用 P1 <- matrix(0, 2, 2); P1inf <- diag(2)
的版本给出了合理的滤波和平滑结果。但是使用 P1 <- matrix(c(1e7, 0, 0, 1e7), 2, 2)
的版本看起来完全没有意义。它与参考版本根本不一致,并显示出发散的趋势。
有人能解释一下这里发生了什么吗?
(显然,我无法在这里附加数据文件。它不大(大约100个数据点)。我将很乐意将其发送给愿意尝试的任何人。)
我使用了替代数据集运行了代码。我预期两个模型的结果非常接近,偏差集中在系列开始(初始化)。实际上,这两个模型在根本上不同,第二个模型不能成为合理的滤波结果。
英文:
I'm trying to get an idea about the importance of the initialisation for the filtering and smoothing results of KFAS. The code below first replicates the random walk with drift example in the KFAS article with the "exact diffuse initialisation". Then I do the same with the more naive explicit initialisation P1 <- matrix(c(1e7, 0, 0, 1e7), 2, 2)
. The results coincide almost perfectly (as I would expect in a simple example like this). Everything fine so far.
Now I change the dataset. ("tdx" is a series of tax rates that I want to analyse using state space models.) The version with P1 <- matrix(0, 2, 2); P1inf <- diag(2)
gives plausible filtering and smoothing results. But the version with P1 <- matrix(c(1e7, 0, 0, 1e7), 2, 2)
looks like complete nonsense. It does not coincide at all with the reference version and shows a diverging trend.
Can someone explain to me what is going on here?
(Apparently, I'm not able to attach the data file here. It's not large (approximately 100 data points). I would be happy to send it to anyone wanting to give it a try.)
rm(list = ls())
library(KFAS)
data("alcohol", package = "KFAS")
deaths <- window(alcohol[, 2], end = 2007)
population <- window(alcohol[, 6], end = 2007)
data <- deaths / population
# load("data_tdx.RData")
# data <- tdx
# this is the model from the KFAS article (random walk with drift)
Zt <- matrix(c(1, 0), 1, 2)
Ht <- matrix(NA)
Tt <- matrix(c(1, 0, 1, 1), 2, 2)
Rt <- matrix(c(1, 0), 2, 1)
Qt <- matrix(NA)
a1 <- matrix(c(1, 0), 2, 1)
P1 <- matrix(0, 2, 2)
P1inf <- diag(2)
model_gaussian <- SSModel(data ~ -1 +
SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1 = P1,
P1inf = P1inf), H = Ht)
fit_gaussian <- fitSSM(model_gaussian, inits = c(0, 0), method = "BFGS")
out_gaussian <- KFS(fit_gaussian$model)
# this reproduces figure 1 from the article
temp <- cbind(data, out_gaussian$a[,1], out_gaussian$alphahat[,1])
cols <- c("black","red","blue")
lwd <- 2
lty <- c(1,1,1)
plot.ts(temp, plot.type="single", col = cols,lwd = lwd,lty=lty,xlab="",ylab="",main="")
legend("bottomright",
legend = c("Observation Data","Trend filtered","Trend smoothed"),
col = c("black","red","blue"), lty = c(1,1,1),
lwd = c(2,2,2), bty="n", cex=0.9 )
# the same with explicit initialisation
P1 <- matrix(c(1e7, 0, 0, 1e7), 2, 2)
model_gaussian1 <- SSModel(data ~ -1 +
SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1 = P1), H = Ht)
fit_gaussian1 <- fitSSM(model_gaussian1, inits = c(0, 0), method = "BFGS")
out_gaussian1 <- KFS(fit_gaussian1$model)
temp <- cbind(data, out_gaussian1$a[,1], out_gaussian1$alphahat[,1])
cols <- c("black","red","blue")
lwd <- 2
lty <- c(1,1,1)
plot.ts(temp, plot.type="single", col = cols,lwd = lwd,lty=lty,xlab="",ylab="",main="")
legend("bottomright",
legend = c("Observation Data","Trend filtered","Trend smoothed"),
col = c("black","red","blue"), lty = c(1,1,1),
lwd = c(2,2,2), bty="n", cex=0.9 )
(check <- cbind(out_gaussian$a[,1], out_gaussian1$a[,1],
out_gaussian$alphahat[,1], out_gaussian1$alphahat[,1]))
I ran the code with the alternative data set. I expected the results of the two models to be very close and deviations to be concentrated at the beginning of the series (initialisation). Actually, the two models differ fundamentally and the second one cannot be a plausible filtering outcome.
答案1
得分: 2
通过运行您通过电子邮件提供的税率数据集的代码块,看起来结果的差异归结为优化过程陷入某些局部最优解。您可以通过比较通过 fit_gaussian$optim.out
和 fit_gaussian1$optim.out
产生的函数值和参数来看到这一点。
这在状态空间模型中经常会出现问题,正如Helske在KFAS指南/JSS论文中所述:
>状态空间模型的参数估计通常是一项困难的任务,因为似然曲面包含多个极大值,这使得优化问题高度依赖于初始值。通常,未知参数与未观察到的潜在状态相关,例如本示例中的协方差矩阵,具有较少的先验知识。因此,在更复杂的情况下,猜测良好的初始值是具有挑战性的。因此,在可以合理确信找到正确的极值之前,建议尝试多种初始值配置,可能使用几种不同类型的优化程序。
我找到了解决这个问题的几种可能方法。第一种是将P1协方差初始化减小为较小的值,例如1e4(这是有意义的--您知道潜在状态的方差很小,因为税率数据的方差非常小)。另一种选择是为优化提供更好的初始化,就像Helske建议的那样。例如,您可以将其设置为数据的方差(或者更确切地说是方差的对数,因为fitSSM函数是这样设计的)。这将导致预期结果:
fit_gaussian1 <- fitSSM(model_gaussian1, inits = c(-8, -8), method="BFGS")
最后,您可以遵循Helske的另一个建议,并更改优化方法:
fit_gaussian1 <- fitSSM(model_gaussian1, inits = c(0, 0))
这使用optim
函数的默认Nelder-Mead方法,在这种情况下产生更合理的结果(尽管一般情况下可能不是这样的)。
英文:
Having run the code block with the tax rate data set you provided via email, it seems the discrepancy in result comes down to the optimization routine getting stuck in some local optimum. You can see this by comparing the function values and parameters produced via fit_gaussian$optim.out
and fit_gaussian1$optim.out
.
This can often be an issue in state space models, as Helske comments in the KFAS vignette/JSS paper:
>Parameter estimation of a state space model is often a difficult task, as the likelihood surface contains multiple maxima, thus making the optimization problem highly dependent on the initial values. Often the unknown parameters are related to the unobserved latent states, such as the covariance matrix in this example, with little a priori knowledge. Therefore, it is challenging to guess good initial values, especially in more complex settings. Thus, multiple initial value configurations possibly with several different type of optimization routines is recommended before one can be reasonably sure that proper optimum is found.
I found several possible solutions to this issue. The first is to reduce the P1 covariance initialization to a smaller value such as 1e4 (which makes sense--you know the latent state will have low variance since the tax rate data has very low variance). Another option is to have a better initialization for the optimization, as Helske suggests. For example you could set it as the variance of your data (or rather the log of the variance since that is how the fitSSM function is designed). This will lead to the intended result:
fit_gaussian1 <- fitSSM(model_gaussian1, inits = c(-8, -8), method="BFGS")
Finally, you could follow the other suggestion of Helske and change the optimization method:
fit_gaussian1 <- fitSSM(model_gaussian1, inits = c(0, 0))
This uses the default Nelder-Mead method of the optim
function, and produces more sensible results in this case (although that may not be generally true).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论