英文:
Backwards fill last instances of NAs in R Data Table
问题
library(data.table)
dt <- data.table(V1 = c(58,NA,NA,NA,NA,13,NA,NA,NA,12,23,NA,12))
dt[, rleid := rleid(V1)]
dt[, num := seq_len(.N), by = rleid]
dt[, max_num := max(num, na.rm = TRUE), by = rleid]
# Fill NAs based on condition
dt[, V2 := ifelse(is.na(V1) & num <= 2, V1, NA), by = rleid]
# Fill NAs forward
dt[, V2 := nafill(V2, type = "locf"), by = rleid]
# Remove the temporary columns
dt[, c("rleid", "num", "max_num") := NULL]
# Result
result <- dt$V2
result
这个代码会更快地填充NA值,得到你期望的结果。
英文:
I have a data.table with a column like:
c(58,NA,NA,NA,NA,13,NA,NA,NA,12,23,NA,12)
I would like to fill only the two NAs before each non-NA value in the column by carrying backwards the next non-NA value. The result should be :
c(58,NA,NA,13,13,13,NA,12,12,12,23,12,12)
I have managed to do it with :
dt = data.table(V1 = c(58,NA,NA,NA,NA,13,NA,NA,NA,12,23,NA,12))
dt[, rleid:=rleid(dt$V1)]
dt[, num := seq(.N), rleid]
u=1
arr = c()
for (i in 1:(nrow(dt)-1)){
if(dt$rleid[i] == dt$rleid[i+1]){
u=u+1
next
}
else{
arr = append(arr,u)}
u=1
}
arr=append(arr,1)
v=c()
for (i in 1:(length(arr))){
for (j in 1:arr[i]){
v=append(v,arr[i])
}
}
dt[, len:=v]
dt[, val:=len-num]
dt[, V2 := fifelse(is.na(V1) & val<=1, nafill(V1, "nocb"), V1)]
This solution takes too long for a big data table. Any suggestions that are faster ?
答案1
得分: 4
以下是翻译好的部分:
A quick and dirty data.table
solution:
快速且不太规范的 data.table
解决方案:
dt[, V1b := fcoalesce(c(list(V1), shift(V1, -(1:2))))]
Or simply (as suggested by B. Christian Kamgang)
或者简单地(如B. Christian Kamgang建议的)
dt[, V1b := fcoalesce(shift(V1, -(0:2)))]
V1 V1b
<num> <num>
1: 58 58
2: NA NA
3: NA NA
4: NA 13
5: NA 13
6: 13 13
7: NA NA
8: NA 12
9: NA 12
10: 12 12
11: 23 23
12: NA 12
13: 12 12
英文:
A quick and dirty data.table
solution:
dt[, V1b := fcoalesce(c(list(V1), shift(V1, -(1:2))))]
# Or simply (as suggested by B. Christian Kamgang)
dt[, V1b := fcoalesce(shift(V1, -(0:2)))]
V1 V1b
<num> <num>
1: 58 58
2: NA NA
3: NA NA
4: NA 13
5: NA 13
6: 13 13
7: NA NA
8: NA 12
9: NA 12
10: 12 12
11: 23 23
12: NA 12
13: 12 12
答案2
得分: 2
你可以修改提供给你的函数这里,通过两次反转向量来实现,即:
na_locf_max_backwards <- function(x, nmax){
x <- rev(x)
s <- split(x, cumsum(!is.na(x)))
l <- mapply(\(x, y) {
x[1:nmax+1] <- x[1]
length(x) <- y
x
}, s, lengths(s))
x <- unlist(l, use.names = FALSE)
x <- rev(x)
x
}
na_locf_max_backwards(c(58,NA,NA,NA,NA,13,NA,NA,NA,12,23,NA,12), nmax = 2)
# [1] 58 NA NA 13 13 13 NA 12 12 12 23 12 12
英文:
You can modify the function given to you here by reversing the vector twice, i.e.
na_locf_max_backwards <- function(x, nmax){
x <- rev(x)
s <- split(x, cumsum(!is.na(x)))
l <- mapply(\(x, y) {
x[1:nmax+1] <- x[1]
length(x) <- y
x
}, s, lengths(s))
x <- unlist(l, use.names = FALSE)
x <- rev(x)
x
}
na_locf_max_backwards(c(58,NA,NA,NA,NA,13,NA,NA,NA,12,23,NA,12), nmax = 2)
# [1] 58 NA NA 13 13 13 NA 12 12 12 23 12 12
答案3
得分: 1
另一个 data.table 解决方案:
dt[, V2 := fifelse(rev(rowid(rev(rleid(V1)))) <= 2, nafill(V1, "nocb"), V1)]
V1 V2
1: 58 58
2: NA NA
3: NA NA
4: NA 13
5: NA 13
6: 13 13
7: NA NA
8: NA 12
9: NA 12
10: 12 12
11: 23 23
12: NA 12
13: 12 12
英文:
Another data.table solution:
dt[, V2 := fifelse(rev(rowid(rev(rleid(V1))))<=2, nafill(V1, "nocb"), V1)]
V1 V2
1: 58 58
2: NA NA
3: NA NA
4: NA 13
5: NA 13
6: 13 13
7: NA NA
8: NA 12
9: NA 12
10: 12 12
11: 23 23
12: NA 12
13: 12 12
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论