应用用户定义的函数到使用zoo库的滚动窗口

huangapple go评论60阅读模式
英文:

Applying user defined function to rolling window with zoo

问题

我正在研究一个项目,我想重新创建一篇文章中提到的市场效率指标。由于我正在处理一个大型数据集,我决定在R中自动化这个过程。首先,我定义了一个函数,该函数返回该指标中使用的标准化贝塔系数,下面是一个可重现的示例:

beta_hats = function(j) {
  step1 = ar(j, aic = TRUE)$asy.var.coef
  step2 = ar(j, aic = TRUE)$ar
  step3 = chol(step1)
  step4 = t(step3)
  step5 = solve(step4)
  step6 = step5 %*% step2
  step7 = abs(step6)
  step8 = sum(step7)
    return(step8)
}

repro = data.frame(rnorm(3000, 0.0003563425, 0.0216025))
beta_hats(repro)

> beta_hats(repro)
[1] 1.587869

这将生成整个数据集的所需结果,然而,我希望我的指标是随时间变化的,所以我尝试在滚动窗口上重复这个函数。

y = repro
t = 250
library(zoo)
z = rollapplyr(y, t, function(y) beta_hats(y))

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'

在这一点上,函数不再起作用。有人可以帮助我解决这个问题吗?

附加信息:

  • 在可重现示例中不添加data.frame()规范会在整个数据集的函数处产生相同的错误。
  • 由于可重现示例完全是随机的,如果您决定使用真实市场收益来重现错误,该函数可能会产生更高的值。
  • 数据集的class()返回"tbl_df","tbl","data.frame"。
英文:

I am working on a research project, and I wanted to recreate a market efficiency measure I have read about in an article. Since I am working on a large data set I decided to automate the process in R. First, I defined a function which returns the standardized beta coefficients used in the measure, here showed with a reproducible example:

beta_hats = function(j) {
  step1 = ar(j, aic = TRUE)$asy.var.coef
  step2 = ar(j, aic = TRUE)$ar
  step3 = chol(step1)
  step4 = t(step3)
  step5 = solve(step4)
  step6 = step5 %*% step2
  step7 = abs(step6)
  step8 = sum(step7)
    return(step8)
}

repro = data.frame(rnorm(3000, 0.0003563425, 0.0216025))
beta_hats(repro)

> beta_hats(repro)
[1] 1.587869

This generates the desired outcome for the entire data set, however, I want my measure to be time-varying so I attempted to repeat the function over rolling windows.

y = repro
t = 250
library(zoo)
z = rollapplyr(y, t, function(y) beta_hats(y))

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'

At this point the function no longer works. Can anyone help me solve this issue?

Additional information:

  • Not adding the data.frame() specification to the reproducible example
    produces the same error already at the function for the entire data
    set
  • Since the reproducible example is completely random the function
    might produce a much higher value if you decide to use real market
    returns to reproduce the error
  • class() of data set returns "tbl_df", "tbl", "data.frame"

答案1

得分: 1

这是一个当你想对一个NULL对象进行Cholesky分解时产生的错误:

chol(NULL)
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

这表明问题出现在数据而不是rollapply函数内部。尝试重新生成数据,然后再次在数据上调用你的函数。系数估计的渐近理论方差矩阵似乎为NULL。请注意,它们是在提供order大于0的情况下给出的。

例如:

set.seed(1)
repro = data.frame(a=rnorm(3000, 0.0003563425, 0.0216025))
ar(repro$a, aic =TRUE)$order
[1] 0

由于order为0,因此step1中的这个数据集的渐近理论方差将为NULL

ar(repro$a, aic =TRUE)$asy.var.coef
[1] NULL

因此,你的函数的step3将引发你遇到的错误。你需要在一个有效的数据集上运行你的函数。

还要注意,虽然该函数可能在完整数据集中不会引发错误,但如果你使用子集,由于上述原因,它可能最终引发错误。

英文:

This is an error produced when you want to carry out a cholesky decomposition of a NULL object:

chol(NULL)
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

This shows that the problem lies within your data rather than in the rollapply function. try regenerating the data and call your function on the data again. The asymptotic-theory variance matrix of the coefficient estimates seems to be NULL. note that they are given provided order>0

Eg:

set.seed(1)
repro = data.frame(a=rnorm(3000, 0.0003563425, 0.0216025))
ar(repro$a, aic =TRUE)$order
[1] 0

Since the order is 0, the assymptotic theory variance for this dataset from step1 will be NULL:

 ar(repro$a, aic =TRUE)$asy.var.coef
 [1] NULL

hence step3 of your function will throw the error you have. You need to run your function in a valid dataset.

Also note that although the function might not throw an error in the full dataset, it might end up throwing an error if you use a subset due to the reasons stated above

huangapple
  • 本文由 发表于 2023年2月14日 06:28:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75441792.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定