英文:
Replace missing datapoints with the average of 2 other distant observations when there are multiple missing observations
问题
我有一个关于每小时动物移动的数据集,但有几次观察者会定期缺席。我希望用同一时间段前后24小时的平均值来替换缺失的数据点(在一个新列中)。
示例数据:
# 创建数据
Day1 <- rep(1, 24)
Day2 <- rep(2, 24)
Day3 <- rep(3, 24)
Day <- c(Day1, Day2, Day3)
Hour <- rep(0:23, 3)
Net <- round(rnorm(length(Day), mean = 2))
Dat <- data.frame(Day = Day, Hour = Hour, Net = Net)
# 填充缺失观测
Dat[27, 3] <- NA
Dat[31, 3] <- NA
Dat
我最初应用了下面的函数,它会定位单个缺失值,然后索引缺失的数据点,以查找并取出缺失点前后24小时的行的平均值。
Dat$new.net <- sapply(Dat[, 3], function(x)
if_else(is.na(x), mean(c(Dat[which(is.na(Dat), arr.ind = TRUE)[1] - 24, 3], Dat[which(is.na(Dat), arr.ind = TRUE)[1] + 24, 3])), x))
我找不到一种方法,使我用于处理一个缺失值的函数适用于处理多个缺失情况,为每个缺失值生成一个唯一的平均值。当前的代码只使用了第一个缺失值的平均值,因为它使用了"Dat[which(is.na(Dat), arr.ind = TRUE)[1]"。
如何修改我的代码以处理多个缺失值,或者是否有更优雅的解决方案?
附注:我知道如果在第一个或最后23个小时存在缺失值,我将会遇到问题。我将在那时解决这个问题。
任何帮助将不胜感激!
英文:
I have a dataset of net hourly animal movements but there are several occasions where observers were periodically absent. I wish to replace the missing datapoints (in a new column) with the average of the same time period 24 hours before and after the missing datapoint.
Example data:
#Data Creation
Day1<- rep(1,24)
Day2<- rep(2,24)
Day3<- rep(3,24)
Day<- c(Day1,Day2,Day3)
Hour<- rep(0:23,3)
Net <- round(rnorm(length(Day),mean = 2))
Dat<- data.frame(Day= Day,Hour= Hour,Net= Net)
#Populate missing observations
Dat[27,3]<- NA
Dat[31,3]<- NA
Dat
I initially applied a function (below) that would locate a single missing value and then index the missing datapoint to locate and take the average of the rows 24 hours before and after the missing point.
Dat$new.net<- sapply(Dat[,3],function(x)
if_else(is.na(x), mean(c(Dat[which(is.na(Dat),arr.ind = T)[1]-24,3],Dat[which(is.na(Dat),arr.ind = T)[1]+24,3])),x))
I cannot find a way to make the function I used for 1 missing value work for multiple missing occasions, producing a unique average for each missing value. Currently the code only uses the average for the first missing value due to the "Dat[which(is.na(Dat),arr.ind = T)[1]"
How can I alter my code to work for multiple missing values, or is there a more elegant solution?
PS. I know I will have issues if there are missing values in the first or final 23 hours. I will cross that bridge when I get there.
Any help will be greatly appreciated!
答案1
得分: 0
We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index
ind <- which(is.na(Dat[[3]]))
ind_minus <- ind - 24
ind_minus[ind_minus < 1] <- NA
ind_plus <- ind + 24
ind_plus[ind_plus > nrow(Dat)] <- NA
Dat[[3]][ind] <- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
na.rm = TRUE)
英文:
We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index
ind <- which(is.na(Dat[[3]]))
ind_minus <- ind - 24
ind_minus[ind_minus < 1] <- NA
ind_plus <- ind + 24
nd_plus[ind_plus > nrow(Dat)] <- NA
Dat[[3]][ind] <- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
na.rm = TRUE)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论