R使用自相关进行缺失值插补。

huangapple go评论72阅读模式
英文:

R imputation of missing value using autocorrelation

问题

我正在尝试填补两个缺失值。我的讲师建议使用与以下阶段的最大自相关性的结果:
1. 从数据集中找到最小和最大数据。假设我使用AirPassenger数据,省略两个数据。最小数据是104,最大数据是622。
2. 对两个缺失数据的每种组合进行计算,找到滞后1的自相关性(ACF)。通过将两个缺失数据替换为最小和最大范围内的数字(`104 <= x <= 622`)进行实验。
3. 根据最大自相关性选择缺失数据的插补。
4. 期望的输出是一个由插补的时间序列自相关性结果组成的矩阵。

我正在尝试使用R计算它,但我使用的代码出现了错误,我对是否继续下去感到困惑。以下是代码


AirPassengers[43]<-NA
AirPassengers[100]<-NA
Fun_mv = function(g,h){
g=104:622
n=length(g)
empty_matrix=matrix(nrow = n, ncol = n, dimnames = list(g,g))
for (i in g){
for (j in g){
AirPassengers[43]=i
AirPassengers[100]=j
empty_matrix[i,j]=acf(AirPassengers)$acf[2]
}
}
}
h=outer(g,g,FUN = Fun_mv);h

非常感谢您的帮助!
英文:

I'm trying to fill in 2 missing values. My lecturer suggests using the results of the largest autocorrelation with the following stages:

  1. Find minimal and maximum data from the dataset. Suppose I use AirPassanger data by omitting two data. The minimum data is 104 and the maximum data is 622.
  2. Calculations are performed for each combination of the two missing data by finding the autocorrelation (ACF) in lag 1. The experiment is carried out by replacing the two missing data with numbers in the minimum and maximum range (104 &lt;= x &lt;= 622).
  3. Imputation of missing data is selected based on the largest autocorrelation.
  4. The expected output is in the form of a matrix of imputed time series autocorrelation results.

I'm trying to calculate it using R, but the code I used found an error and I'm confused about continuing this. Here is the code

AirPassengers[43]&lt;-NA
AirPassengers[100]&lt;-NA
Fun_mv = function(g,h){
  g=104:622
  n=length(g)
  empty_matrix=matrix(nrow = n, ncol = n, dimnames = list(g,g))
  for (i in g){
    for (j in g){
      AirPassengers[43]=i
      AirPassengers[100]=j
      empty_matrix[i,j]=acf(AirPassengers)$acf[2]
    }
  }
}
h=outer(g,g,FUN = Fun_mv);h

Any help is greatly appreciated!

get the correct code

答案1

得分: 0

outer函数中没有必要调用,函数的双重循环已经完成了。请注意,将NA赋值给AirPassengers[43]在内部循环之外。而且,acf(., plot = FALSE)的赋值可以节省大量时间。

英文:

There is no need for a call to outer, the function's double loop already does it.
Note that the assignment of NA to AirPassengers[43] is outside the inner loop. And that acf(., plot = FALSE) saves a lot of time.

Fun_mv &lt;- function(g, h){
  n &lt;- length(g)
  empty_matrix &lt;- matrix(nrow = n, ncol = n, dimnames = list(g, g))
  for (i in seq_along(g)){
    AirPassengers[43] &lt;- g[i]
    for (j in seq_along(g)){
      AirPassengers[100] &lt;- h[j]
      empty_matrix[i, j] &lt;- acf(AirPassengers, plot = FALSE)$acf[2]
    }
  }
  empty_matrix
}

AirPassengers[43] &lt;- NA
AirPassengers[100] &lt;- NA
g &lt;- 104:622
h &lt;- Fun_mv(g, g)

str(h)
#&gt;  num [1:519, 1:519] 0.871 0.871 0.871 0.871 0.871 ...
#&gt;  - attr(*, &quot;dimnames&quot;)=List of 2
#&gt;   ..$ : chr [1:519] &quot;104&quot; &quot;105&quot; &quot;106&quot; &quot;107&quot; ...
#&gt;   ..$ : chr [1:519] &quot;104&quot; &quot;105&quot; &quot;106&quot; &quot;107&quot; ...

<sup>Created on 2023-05-13 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月13日 15:43:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241618.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定