使用两个其他远距离观察的平均值来替换多个缺失的观测数据点。

huangapple go评论90阅读模式
英文:

Replace missing datapoints with the average of 2 other distant observations when there are multiple missing observations

问题

我有一个关于每小时动物移动的数据集,但有几次观察者会定期缺席。我希望用同一时间段前后24小时的平均值来替换缺失的数据点(在一个新列中)。

示例数据:

  1. # 创建数据
  2. Day1 <- rep(1, 24)
  3. Day2 <- rep(2, 24)
  4. Day3 <- rep(3, 24)
  5. Day <- c(Day1, Day2, Day3)
  6. Hour <- rep(0:23, 3)
  7. Net <- round(rnorm(length(Day), mean = 2))
  8. Dat <- data.frame(Day = Day, Hour = Hour, Net = Net)
  9. # 填充缺失观测
  10. Dat[27, 3] <- NA
  11. Dat[31, 3] <- NA
  12. Dat

我最初应用了下面的函数,它会定位单个缺失值,然后索引缺失的数据点,以查找并取出缺失点前后24小时的行的平均值。

  1. Dat$new.net <- sapply(Dat[, 3], function(x)
  2. if_else(is.na(x), mean(c(Dat[which(is.na(Dat), arr.ind = TRUE)[1] - 24, 3], Dat[which(is.na(Dat), arr.ind = TRUE)[1] + 24, 3])), x))

我找不到一种方法,使我用于处理一个缺失值的函数适用于处理多个缺失情况,为每个缺失值生成一个唯一的平均值。当前的代码只使用了第一个缺失值的平均值,因为它使用了"Dat[which(is.na(Dat), arr.ind = TRUE)[1]"。

如何修改我的代码以处理多个缺失值,或者是否有更优雅的解决方案?

附注:我知道如果在第一个或最后23个小时存在缺失值,我将会遇到问题。我将在那时解决这个问题。

任何帮助将不胜感激!

英文:

I have a dataset of net hourly animal movements but there are several occasions where observers were periodically absent. I wish to replace the missing datapoints (in a new column) with the average of the same time period 24 hours before and after the missing datapoint.

Example data:

  1. #Data Creation
  2. Day1&lt;- rep(1,24)
  3. Day2&lt;- rep(2,24)
  4. Day3&lt;- rep(3,24)
  5. Day&lt;- c(Day1,Day2,Day3)
  6. Hour&lt;- rep(0:23,3)
  7. Net &lt;- round(rnorm(length(Day),mean = 2))
  8. Dat&lt;- data.frame(Day= Day,Hour= Hour,Net= Net)
  9. #Populate missing observations
  10. Dat[27,3]&lt;- NA
  11. Dat[31,3]&lt;- NA
  12. Dat

I initially applied a function (below) that would locate a single missing value and then index the missing datapoint to locate and take the average of the rows 24 hours before and after the missing point.

  1. Dat$new.net&lt;- sapply(Dat[,3],function(x)
  2. if_else(is.na(x), mean(c(Dat[which(is.na(Dat),arr.ind = T)[1]-24,3],Dat[which(is.na(Dat),arr.ind = T)[1]+24,3])),x))

I cannot find a way to make the function I used for 1 missing value work for multiple missing occasions, producing a unique average for each missing value. Currently the code only uses the average for the first missing value due to the "Dat[which(is.na(Dat),arr.ind = T)[1]"

How can I alter my code to work for multiple missing values, or is there a more elegant solution?

PS. I know I will have issues if there are missing values in the first or final 23 hours. I will cross that bridge when I get there.

Any help will be greatly appreciated!

答案1

得分: 0

We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index

  1. ind <- which(is.na(Dat[[3]]))
  2. ind_minus <- ind - 24
  3. ind_minus[ind_minus < 1] <- NA
  4. ind_plus <- ind + 24
  5. ind_plus[ind_plus > nrow(Dat)] <- NA
  6. Dat[[3]][ind] <- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
  7. na.rm = TRUE)
英文:

We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index

  1. ind &lt;- which(is.na(Dat[[3]]))
  2. ind_minus &lt;- ind - 24
  3. ind_minus[ind_minus &lt; 1] &lt;- NA
  4. ind_plus &lt;- ind + 24
  5. nd_plus[ind_plus &gt; nrow(Dat)] &lt;- NA
  6. Dat[[3]][ind] &lt;- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
  7. na.rm = TRUE)

huangapple
  • 本文由 发表于 2023年2月6日 09:49:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356716.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定