英文:
Replace last non-NA value(s) with NA [R]
问题
I have a dataframe with 7 character columns and 1 column with integers (x).
Column x denotes the amount of non-NA values that I want to replace with NA (in this example x goes as high as 2, but could also be higher than that).
library(data.table)
d1 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),
K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", "1112", NA, "R"),
K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
K4 = c("P", NA, "1115", "1115", "1115", "1116", "1116", "1116", NA, NA, NA, NA),
K5 = c(NA, NA, "1117", "1117", "1117", "1118", "1118", "1118", NA, NA, NA, NA),
K6 = c(NA, NA, NA, NA, NA, "1119","1119","1119", NA, NA, NA, NA),
x = c(0,0,0,1,2,0,1,2,0,1,0,0))
I tried a lot (what is not worth mentioning here), but I can't figure out how to solve this problem.
The desired output should look like this.
d2 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),
K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", NA, NA, "R"),
K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
K4 = c("P", NA, "1115", "1115", NA, "1116", "1116", "1116", NA, NA, NA, NA),
K5 = c(NA, NA, "1117", NA, NA, "1118", "1118", NA, NA, NA, NA, NA),
K6 = c(NA, NA, NA, NA, NA, "1119",NA,NA, NA, NA, NA, NA),
x = c(0,0,0,1,2,0,1,2,0,1,0,0))
An example for clarification; row 3, 4 and 5 are duplicates and this should become
Row 3: C, J, P, 1113, 1115, 1117, NA 0 (0 = no replacement needed)
Row 4: C, J, P, 1113, 1115, NA, NA 1 (1 = last non-NA replaced with NA)
Row 5: C, J, P, 1113, NA, NA, NA 2 (2 = last 2 non-NA replaced with NA)
If some R mastermind could help me out, that would be great.
(I prefer dplyr, but at this point anything will do).
Using R version 4.1.0.
英文:
I have a dataframe with 7 character columns and 1 column with integers (x).
Column x denotes the amount of non-NA values that I want to replace with NA (in this example x goes as high as 2, but could also be higher than that).
library(data.table)
d1 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),
K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", "1112", NA, "R"),
K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
K4 = c("P", NA, "1115", "1115", "1115", "1116", "1116", "1116", NA, NA, NA, NA),
K5 = c(NA, NA, "1117", "1117", "1117", "1118", "1118", "1118", NA, NA, NA, NA),
K6 = c(NA, NA, NA, NA, NA, "1119","1119","1119", NA, NA, NA, NA),
x = c(0,0,0,1,2,0,1,2,0,1,0,0))
I tried a lot (what is not worth mentioning here), but I can't figure out how to solve this problem.
The desired output should look like this.
d2 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),
K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", NA, NA, "R"),
K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
K4 = c("P", NA, "1115", "1115", NA, "1116", "1116", "1116", NA, NA, NA, NA),
K5 = c(NA, NA, "1117", NA, NA, "1118", "1118", NA, NA, NA, NA, NA),
K6 = c(NA, NA, NA, NA, NA, "1119",NA,NA, NA, NA, NA, NA),
x = c(0,0,0,1,2,0,1,2,0,1,0,0))
An example for clarification; row 3, 4 and 5 are duplicates and this should become
Row 3: C, J, P, 1113, 1115, 1117, NA 0 (0 = no replacement needed)
Row 4: C, J, P, 1113, 1115, NA, NA 1 (1 = last non-NA replaced with NA)
Row 5: C, J, P, 1113, NA, NA, NA 2 (2 = last 2 non-NA replaced with NA)
If some R mastermind could help me out, that would be great.
(I prefer dplyr, but at this point anything will do).
Using R version 4.1.0.
答案1
得分: 1
以下是您要翻译的内容:
One way, not using dplyr but using apply()
, would be to create a function that fills the last n non-missing values with NA, then use apply()
to do this for every row:
replace_last_n <- function(x){
# Get the last column - the number of NA's to fill
n <- as.numeric(x[length(x)])
# Remove that column
x <- x[1:length(x)-1]
# If nothing to do
if(n == 0) return(c(x, n))
# consider if there n is greater than the number of non-missing values
if(n > length(x[!is.na(x)])) return(c(rep(NA, length(x)), n))
# Determine number of nonmissing values
tmp <- length(x[!is.na(x)])
# fill the last n non missing values with NA
x[!is.na(x)][(tmp - n + 1):tmp] <- NA
# return entire row
return(c(x, n))
}
Where the last value in x is the number of NA's to fill. For example:
replace_last_n(c(1,2,3,4,5, 1))
# [1] 1 2 3 4 NA 1
replace_last_n(c(1,2,3,4,NA, 1))
# [1] 1 2 3 NA NA 1
replace_last_n(c(1,2,3,NA,5, 2))
# [1] 1 2 NA NA NA 2
Then you can use apply()
to use this function on every single row:
apply(d1, 1, \(x) replace_last_n(x)) %>% t() %>% as_tibble
# K0 K1 K2 K3 K4 K5 K6 V8
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 A H N S P NA NA 0
# 2 B I O T NA NA NA 0
# 3 C J P 1113 1115 1117 NA 0
# 4 D J P 1113 1115 NA NA 1
# 5 E J P 1113 NA NA NA 2
# 6 E K Q U 1116 1118 1119 0
# 7 E K Q U 1116 1118 NA 1
# 8 F K Q U 1116 NA NA 2
# 9 F 1111 1112 NA NA NA NA 0
# 10 F 1111 NA NA NA NA NA 1
# 11 G L NA NA NA NA NA 0
# 12 G M R 1114 NA NA NA 0
英文:
One way, not using dplyr but using apply()
, would be to create a function that fills the last n non-missing values with NA, then use apply()
to do this for every row:
replace_last_n <- function(x){
# Get the last column - the number of NA's to fill
n <- as.numeric(x[length(x)])
# Remove that column
x <- x[1:length(x)-1]
# If nothing to do
if(n == 0) return(c(x, n))
# consider if there n is greater than the number of non-missing values
if(n > length(x[!is.na(x)])) return(c(rep(NA, length(x)), n))
# Determine number of nonmissing values
tmp <- length(x[!is.na(x)])
# fill the last n non missing values with NA
x[!is.na(x)][(tmp - n + 1):tmp] <- NA
# return entire row
return(c(x, n))
}
Where the last value in x is the number of NA's to fill. For example:
replace_last_n(c(1,2,3,4,5, 1))
# [1] 1 2 3 4 NA 1
replace_last_n(c(1,2,3,4,NA, 1))
# [1] 1 2 3 NA NA 1
replace_last_n(c(1,2,3,NA,5, 2))
# [1] 1 2 NA NA NA 2
Then you can use apply()
to use this function on every single row:
apply(d1, 1, \(x) replace_last_n(x)) %>% t() %>% as_tibble
# K0 K1 K2 K3 K4 K5 K6 V8
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 A H N S P NA NA 0
# 2 B I O T NA NA NA 0
# 3 C J P 1113 1115 1117 NA 0
# 4 D J P 1113 1115 NA NA 1
# 5 E J P 1113 NA NA NA 2
# 6 E K Q U 1116 1118 1119 0
# 7 E K Q U 1116 1118 NA 1
# 8 F K Q U 1116 NA NA 2
# 9 F 1111 1112 NA NA NA NA 0
# 10 F 1111 NA NA NA NA NA 1
# 11 G L NA NA NA NA NA 0
# 12 G M R 1114 NA NA NA 0
答案2
得分: 0
这应该适用于您的示例数据,但不确定您的实际数据结构是否匹配。这里的一个重要假设是中间列没有NA。
library(tidyverse)
d1 %>%
rowwise() %>%
mutate(string_collection = list(c(K0, K1, K2, K3, K4, K5, K6))) %>%
mutate(column_to_keep = length(na.omit(string_collection)) - x) %>%
mutate(string_collection = list((string_collection)[1:column_to_keep])) %>%
mutate(string_collection = paste0(string_collection, collapse = ",")) %>%
dplyr::select(x, string_collection) %>%
separate(string_collection, into = c("K0", "K1", "K2", "K3", "K4", "K5", "K6"), sep = ",") %>%
relocate(x, .after = last_col()) %>%
ungroup()
希望这对您有帮助。
英文:
This should work for your sample data, but not sure if the structure in your actual data would fit. The big assumption here is that there is no NA in the middle columns.
library(tidyverse)
d1 %>%
rowwise() %>%
mutate(string_collection = list(c(K0, K1, K2, K3, K4, K5, K6))) %>%
mutate(column_to_keep = length(na.omit(string_collection)) - x) %>% # assume NA only found at the end
mutate(string_collection = list((string_collection)[1:column_to_keep])) %>%
mutate(string_collection = paste0(string_collection, collapse = ",")) %>% # assume character column do not have ","
dplyr::select(x, string_collection) %>%
separate(string_collection, into = c("K0", "K1", "K2", "K3", "K4", "K5", "K6"), sep = ",") %>%
relocate(x, .after = last_col()) %>% # can skip the lines below
ungroup()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论