英文:
Replace missing datapoints with the average of 2 other distant observations when there are multiple missing observations
问题
我有一个关于每小时动物移动的数据集,但有几次观察者会定期缺席。我希望用同一时间段前后24小时的平均值来替换缺失的数据点(在一个新列中)。
示例数据:
# 创建数据
Day1 <- rep(1, 24)
Day2 <- rep(2, 24)
Day3 <- rep(3, 24)
Day <- c(Day1, Day2, Day3)
Hour <- rep(0:23, 3)
Net <- round(rnorm(length(Day), mean = 2))
Dat <- data.frame(Day = Day, Hour = Hour, Net = Net)
# 填充缺失观测
Dat[27, 3] <- NA
Dat[31, 3] <- NA
Dat
我最初应用了下面的函数,它会定位单个缺失值,然后索引缺失的数据点,以查找并取出缺失点前后24小时的行的平均值。
Dat$new.net <- sapply(Dat[, 3], function(x)
if_else(is.na(x), mean(c(Dat[which(is.na(Dat), arr.ind = TRUE)[1] - 24, 3], Dat[which(is.na(Dat), arr.ind = TRUE)[1] + 24, 3])), x))
我找不到一种方法,使我用于处理一个缺失值的函数适用于处理多个缺失情况,为每个缺失值生成一个唯一的平均值。当前的代码只使用了第一个缺失值的平均值,因为它使用了"Dat[which(is.na(Dat), arr.ind = TRUE)[1]"。
如何修改我的代码以处理多个缺失值,或者是否有更优雅的解决方案?
附注:我知道如果在第一个或最后23个小时存在缺失值,我将会遇到问题。我将在那时解决这个问题。
任何帮助将不胜感激!
英文:
I have a dataset of net hourly animal movements but there are several occasions where observers were periodically absent. I wish to replace the missing datapoints (in a new column) with the average of the same time period 24 hours before and after the missing datapoint.
Example data:
#Data Creation
Day1<- rep(1,24)
Day2<- rep(2,24)
Day3<- rep(3,24)
Day<- c(Day1,Day2,Day3)
Hour<- rep(0:23,3)
Net <- round(rnorm(length(Day),mean = 2))
Dat<- data.frame(Day= Day,Hour= Hour,Net= Net)
#Populate missing observations
Dat[27,3]<- NA
Dat[31,3]<- NA
Dat
I initially applied a function (below) that would locate a single missing value and then index the missing datapoint to locate and take the average of the rows 24 hours before and after the missing point.
Dat$new.net<- sapply(Dat[,3],function(x)
if_else(is.na(x), mean(c(Dat[which(is.na(Dat),arr.ind = T)[1]-24,3],Dat[which(is.na(Dat),arr.ind = T)[1]+24,3])),x))
I cannot find a way to make the function I used for 1 missing value work for multiple missing occasions, producing a unique average for each missing value. Currently the code only uses the average for the first missing value due to the "Dat[which(is.na(Dat),arr.ind = T)[1]"
How can I alter my code to work for multiple missing values, or is there a more elegant solution?
PS. I know I will have issues if there are missing values in the first or final 23 hours. I will cross that bridge when I get there.
Any help will be greatly appreciated!
答案1
得分: 0
We could get the index of NA
values and then subtract 24, add 24, to each of the elements, get the rowMeans
after cbind
ing and assign it to missing index
ind <- which(is.na(Dat[[3]]))
ind_minus <- ind - 24
ind_minus[ind_minus < 1] <- NA
ind_plus <- ind + 24
ind_plus[ind_plus > nrow(Dat)] <- NA
Dat[[3]][ind] <- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
na.rm = TRUE)
英文:
We could get the index of NA
values and then subtract 24, add 24, to each of the elements, get the rowMeans
after cbind
ing and assign it to missing index
ind <- which(is.na(Dat[[3]]))
ind_minus <- ind - 24
ind_minus[ind_minus < 1] <- NA
ind_plus <- ind + 24
nd_plus[ind_plus > nrow(Dat)] <- NA
Dat[[3]][ind] <- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
na.rm = TRUE)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论