使用两个其他远距离观察的平均值来替换多个缺失的观测数据点。

huangapple go评论61阅读模式
英文:

Replace missing datapoints with the average of 2 other distant observations when there are multiple missing observations

问题

我有一个关于每小时动物移动的数据集,但有几次观察者会定期缺席。我希望用同一时间段前后24小时的平均值来替换缺失的数据点(在一个新列中)。

示例数据:

# 创建数据
Day1 <- rep(1, 24)
Day2 <- rep(2, 24)
Day3 <- rep(3, 24)
Day <- c(Day1, Day2, Day3)
Hour <- rep(0:23, 3)
Net <- round(rnorm(length(Day), mean = 2))
Dat <- data.frame(Day = Day, Hour = Hour, Net = Net)

# 填充缺失观测
Dat[27, 3] <- NA
Dat[31, 3] <- NA
Dat

我最初应用了下面的函数,它会定位单个缺失值,然后索引缺失的数据点,以查找并取出缺失点前后24小时的行的平均值。

Dat$new.net <- sapply(Dat[, 3], function(x)  
   if_else(is.na(x), mean(c(Dat[which(is.na(Dat), arr.ind = TRUE)[1] - 24, 3], Dat[which(is.na(Dat), arr.ind = TRUE)[1] + 24, 3])), x))

我找不到一种方法,使我用于处理一个缺失值的函数适用于处理多个缺失情况,为每个缺失值生成一个唯一的平均值。当前的代码只使用了第一个缺失值的平均值,因为它使用了"Dat[which(is.na(Dat), arr.ind = TRUE)[1]"。

如何修改我的代码以处理多个缺失值,或者是否有更优雅的解决方案?

附注:我知道如果在第一个或最后23个小时存在缺失值,我将会遇到问题。我将在那时解决这个问题。

任何帮助将不胜感激!

英文:

I have a dataset of net hourly animal movements but there are several occasions where observers were periodically absent. I wish to replace the missing datapoints (in a new column) with the average of the same time period 24 hours before and after the missing datapoint.

Example data:

#Data Creation
Day1&lt;- rep(1,24)
Day2&lt;- rep(2,24)
Day3&lt;- rep(3,24)
Day&lt;- c(Day1,Day2,Day3)
Hour&lt;- rep(0:23,3)
Net &lt;- round(rnorm(length(Day),mean = 2))
Dat&lt;- data.frame(Day= Day,Hour= Hour,Net= Net)

#Populate missing observations
Dat[27,3]&lt;- NA
Dat[31,3]&lt;- NA
Dat

I initially applied a function (below) that would locate a single missing value and then index the missing datapoint to locate and take the average of the rows 24 hours before and after the missing point.

Dat$new.net&lt;- sapply(Dat[,3],function(x)  
   if_else(is.na(x), mean(c(Dat[which(is.na(Dat),arr.ind = T)[1]-24,3],Dat[which(is.na(Dat),arr.ind = T)[1]+24,3])),x)) 

I cannot find a way to make the function I used for 1 missing value work for multiple missing occasions, producing a unique average for each missing value. Currently the code only uses the average for the first missing value due to the "Dat[which(is.na(Dat),arr.ind = T)[1]"

How can I alter my code to work for multiple missing values, or is there a more elegant solution?

PS. I know I will have issues if there are missing values in the first or final 23 hours. I will cross that bridge when I get there.

Any help will be greatly appreciated!

答案1

得分: 0

We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index

ind <- which(is.na(Dat[[3]]))
ind_minus <- ind - 24
ind_minus[ind_minus < 1] <- NA
ind_plus <- ind + 24
ind_plus[ind_plus > nrow(Dat)] <- NA

Dat[[3]][ind] <- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
     na.rm = TRUE)
英文:

We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index

ind &lt;- which(is.na(Dat[[3]]))
ind_minus &lt;- ind - 24
ind_minus[ind_minus &lt; 1] &lt;- NA
ind_plus &lt;- ind + 24
nd_plus[ind_plus &gt; nrow(Dat)] &lt;- NA

Dat[[3]][ind] &lt;- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
     na.rm = TRUE)

huangapple
  • 本文由 发表于 2023年2月6日 09:49:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356716.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定