2023年2月6日 09:49:23go评论90阅读模式

英文:

Replace missing datapoints with the average of 2 other distant observations when there are multiple missing observations

问题

我有一个关于每小时动物移动的数据集，但有几次观察者会定期缺席。我希望用同一时间段前后24小时的平均值来替换缺失的数据点（在一个新列中）。

示例数据：

# 创建数据
Day1 <- rep(1, 24)
Day2 <- rep(2, 24)
Day3 <- rep(3, 24)
Day <- c(Day1, Day2, Day3)
Hour <- rep(0:23, 3)
Net <- round(rnorm(length(Day), mean = 2))
Dat <- data.frame(Day = Day, Hour = Hour, Net = Net)
# 填充缺失观测
Dat[27, 3] <- NA
Dat[31, 3] <- NA
Dat

我最初应用了下面的函数，它会定位单个缺失值，然后索引缺失的数据点，以查找并取出缺失点前后24小时的行的平均值。

Dat$new.net <- sapply(Dat[, 3], function(x)  
   if_else(is.na(x), mean(c(Dat[which(is.na(Dat), arr.ind = TRUE)[1] - 24, 3], Dat[which(is.na(Dat), arr.ind = TRUE)[1] + 24, 3])), x))

我找不到一种方法，使我用于处理一个缺失值的函数适用于处理多个缺失情况，为每个缺失值生成一个唯一的平均值。当前的代码只使用了第一个缺失值的平均值，因为它使用了"Dat[which(is.na(Dat), arr.ind = TRUE)[1]"。

如何修改我的代码以处理多个缺失值，或者是否有更优雅的解决方案？

附注：我知道如果在第一个或最后23个小时存在缺失值，我将会遇到问题。我将在那时解决这个问题。

任何帮助将不胜感激！

英文:

I have a dataset of net hourly animal movements but there are several occasions where observers were periodically absent. I wish to replace the missing datapoints (in a new column) with the average of the same time period 24 hours before and after the missing datapoint.

Example data:

#Data Creation
Day1&lt;- rep(1,24)
Day2&lt;- rep(2,24)
Day3&lt;- rep(3,24)
Day&lt;- c(Day1,Day2,Day3)
Hour&lt;- rep(0:23,3)
Net &lt;- round(rnorm(length(Day),mean = 2))
Dat&lt;- data.frame(Day= Day,Hour= Hour,Net= Net)
#Populate missing observations
Dat[27,3]&lt;- NA
Dat[31,3]&lt;- NA
Dat

I initially applied a function (below) that would locate a single missing value and then index the missing datapoint to locate and take the average of the rows 24 hours before and after the missing point.

Dat$new.net&lt;- sapply(Dat[,3],function(x)  
   if_else(is.na(x), mean(c(Dat[which(is.na(Dat),arr.ind = T)[1]-24,3],Dat[which(is.na(Dat),arr.ind = T)[1]+24,3])),x))

I cannot find a way to make the function I used for 1 missing value work for multiple missing occasions, producing a unique average for each missing value. Currently the code only uses the average for the first missing value due to the "Dat[which(is.na(Dat),arr.ind = T)[1]"

How can I alter my code to work for multiple missing values, or is there a more elegant solution?

PS. I know I will have issues if there are missing values in the first or final 23 hours. I will cross that bridge when I get there.

Any help will be greatly appreciated!

答案1

得分: 0

We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index

ind <- which(is.na(Dat[[3]]))
ind_minus <- ind - 24
ind_minus[ind_minus < 1] <- NA
ind_plus <- ind + 24
ind_plus[ind_plus > nrow(Dat)] <- NA
Dat[[3]][ind] <- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
     na.rm = TRUE)

英文:

We could get the index of NA values and then subtract 24, add 24, to each of the elements, get the rowMeans after cbinding and assign it to missing index

ind &lt;- which(is.na(Dat[[3]]))
ind_minus &lt;- ind - 24
ind_minus[ind_minus &lt; 1] &lt;- NA
ind_plus &lt;- ind + 24
nd_plus[ind_plus &gt; nrow(Dat)] &lt;- NA
Dat[[3]][ind] &lt;- rowMeans(cbind(Dat[[3]][ind_minus], Dat[[3]][ind_plus]),
     na.rm = TRUE)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用两个其他远距离观察的平均值来替换多个缺失的观测数据点。

问题

答案1

从SFTP服务器读取Excel文件

有没有一种更简洁的方法来从我的R数据集中获取最早的诊断和代码？

如何缩短运行时间？

Use of svyglm and svydesign with R for multistage stratified cluster design

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。