Replace last non-NA value(s) with NA [R] 将最后一个非NA值替换为NA [R]

huangapple go评论113阅读模式
英文:

Replace last non-NA value(s) with NA [R]

问题

I have a dataframe with 7 character columns and 1 column with integers (x).
Column x denotes the amount of non-NA values that I want to replace with NA (in this example x goes as high as 2, but could also be higher than that).

library(data.table)
d1 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),  
                 K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
                 K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", "1112", NA, "R"),
                 K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
                 K4 = c("P", NA, "1115", "1115", "1115", "1116", "1116", "1116", NA, NA, NA, NA),
                 K5 = c(NA, NA, "1117", "1117", "1117", "1118", "1118", "1118", NA, NA, NA, NA),
                 K6 = c(NA, NA, NA, NA, NA, "1119","1119","1119", NA, NA, NA, NA),
                 x = c(0,0,0,1,2,0,1,2,0,1,0,0)) 

I tried a lot (what is not worth mentioning here), but I can't figure out how to solve this problem.

The desired output should look like this.

d2 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),  
                 K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
                 K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", NA, NA, "R"),
                 K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
                 K4 = c("P", NA, "1115", "1115", NA, "1116", "1116", "1116", NA, NA, NA, NA),
                 K5 = c(NA, NA, "1117", NA, NA, "1118", "1118", NA, NA, NA, NA, NA),
                 K6 = c(NA, NA, NA, NA, NA, "1119",NA,NA, NA, NA, NA, NA),
                 x = c(0,0,0,1,2,0,1,2,0,1,0,0)) 

An example for clarification; row 3, 4 and 5 are duplicates and this should become

Row 3: C, J, P, 1113, 1115, 1117, NA 0 (0 = no replacement needed)

Row 4: C, J, P, 1113, 1115, NA, NA 1 (1 = last non-NA replaced with NA)

Row 5: C, J, P, 1113, NA, NA, NA 2 (2 = last 2 non-NA replaced with NA)

If some R mastermind could help me out, that would be great.
(I prefer dplyr, but at this point anything will do).
Using R version 4.1.0.

英文:

I have a dataframe with 7 character columns and 1 column with integers (x).
Column x denotes the amount of non-NA values that I want to replace with NA (in this example x goes as high as 2, but could also be higher than that).

library(data.table)
d1 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),  
K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", "1112", NA, "R"),
K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
K4 = c("P", NA, "1115", "1115", "1115", "1116", "1116", "1116", NA, NA, NA, NA),
K5 = c(NA, NA, "1117", "1117", "1117", "1118", "1118", "1118", NA, NA, NA, NA),
K6 = c(NA, NA, NA, NA, NA, "1119","1119","1119", NA, NA, NA, NA),
x = c(0,0,0,1,2,0,1,2,0,1,0,0)) 

I tried a lot (what is not worth mentioning here), but I can't figure out how to solve this problem.

The desired output should look like this.

d2 <- data.table(K0 = c("A", "B", "C", "D", "E", "E", "E", "F", "F", "F", "G", "G"),  
K1 = c("H", "I", "J", "J", "J", "K", "K", "K", "1111", "1111", "L", "M"),
K2 = c("N", "O", "P", "P", "P", "Q", "Q", "Q", "1112", NA, NA, "R"),
K3 = c("S", "T", "1113", "1113", "1113", "U", "U", "U", NA, NA, NA, "1114"),
K4 = c("P", NA, "1115", "1115", NA, "1116", "1116", "1116", NA, NA, NA, NA),
K5 = c(NA, NA, "1117", NA, NA, "1118", "1118", NA, NA, NA, NA, NA),
K6 = c(NA, NA, NA, NA, NA, "1119",NA,NA, NA, NA, NA, NA),
x = c(0,0,0,1,2,0,1,2,0,1,0,0)) 

An example for clarification; row 3, 4 and 5 are duplicates and this should become

Row 3: C, J, P, 1113, 1115, 1117, NA 0 (0 = no replacement needed)

Row 4: C, J, P, 1113, 1115, NA, NA 1 (1 = last non-NA replaced with NA)

Row 5: C, J, P, 1113, NA, NA, NA 2 (2 = last 2 non-NA replaced with NA)

If some R mastermind could help me out, that would be great.
(I prefer dplyr, but at this point anything will do).
Using R version 4.1.0.

答案1

得分: 1

以下是您要翻译的内容:

One way, not using dplyr but using apply(), would be to create a function that fills the last n non-missing values with NA, then use apply() to do this for every row:

replace_last_n <- function(x){
  # Get the last column - the number of NA's to fill
  n <- as.numeric(x[length(x)])
  
  # Remove that column
  x <- x[1:length(x)-1]
  
  # If nothing to do
  if(n == 0) return(c(x, n))

  # consider if there n is greater than the number of non-missing values
  if(n > length(x[!is.na(x)])) return(c(rep(NA, length(x)), n))
  
  # Determine number of nonmissing values
  tmp <- length(x[!is.na(x)])
  
  # fill the last n non missing values with NA
  x[!is.na(x)][(tmp - n + 1):tmp] <- NA
  
  # return entire row
  return(c(x, n))
}

Where the last value in x is the number of NA's to fill. For example:

replace_last_n(c(1,2,3,4,5, 1))
# [1]  1  2  3  4 NA  1

replace_last_n(c(1,2,3,4,NA, 1))
# [1]  1  2  3 NA NA  1

replace_last_n(c(1,2,3,NA,5, 2))
# [1]  1  2 NA NA NA  2

Then you can use apply() to use this function on every single row:

apply(d1, 1, \(x) replace_last_n(x)) %>% t() %>% as_tibble
#   K0    K1    K2    K3    K4    K5    K6    V8   
#    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#  1 A     H     N     S     P     NA    NA    0    
#  2 B     I     O     T     NA    NA    NA    0    
#  3 C     J     P     1113  1115  1117  NA    0    
#  4 D     J     P     1113  1115  NA    NA    1    
#  5 E     J     P     1113  NA    NA    NA    2    
#  6 E     K     Q     U     1116  1118  1119  0    
#  7 E     K     Q     U     1116  1118  NA    1    
#  8 F     K     Q     U     1116  NA    NA    2    
#  9 F     1111  1112  NA    NA    NA    NA    0    
# 10 F     1111  NA    NA    NA    NA    NA    1    
# 11 G     L     NA    NA    NA    NA    NA    0    
# 12 G     M     R     1114  NA    NA    NA    0  
英文:

One way, not using dplyr but using apply(), would be to create a function that fills the last n non-missing values with NA, then use apply() to do this for every row:

replace_last_n <- function(x){
  # Get the last column - the number of NA's to fill
  n <- as.numeric(x[length(x)])
  
  # Remove that column
  x <- x[1:length(x)-1]
  
  # If nothing to do
  if(n == 0) return(c(x, n))

  # consider if there n is greater than the number of non-missing values
  if(n > length(x[!is.na(x)])) return(c(rep(NA, length(x)), n))
  
  # Determine number of nonmissing values
  tmp <- length(x[!is.na(x)])
  
  # fill the last n non missing values with NA
  x[!is.na(x)][(tmp - n + 1):tmp] <- NA
  
  # return entire row
  return(c(x, n))
}

Where the last value in x is the number of NA's to fill. For example:

replace_last_n(c(1,2,3,4,5, 1))
# [1]  1  2  3  4 NA  1

replace_last_n(c(1,2,3,4,NA, 1))
# [1]  1  2  3 NA NA  1

replace_last_n(c(1,2,3,NA,5, 2))
# [1]  1  2 NA NA NA  2

Then you can use apply() to use this function on every single row:

apply(d1, 1, \(x) replace_last_n(x)) %>% t() %>% as_tibble
#   K0    K1    K2    K3    K4    K5    K6    V8   
#    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#  1 A     H     N     S     P     NA    NA    0    
#  2 B     I     O     T     NA    NA    NA    0    
#  3 C     J     P     1113  1115  1117  NA    0    
#  4 D     J     P     1113  1115  NA    NA    1    
#  5 E     J     P     1113  NA    NA    NA    2    
#  6 E     K     Q     U     1116  1118  1119  0    
#  7 E     K     Q     U     1116  1118  NA    1    
#  8 F     K     Q     U     1116  NA    NA    2    
#  9 F     1111  1112  NA    NA    NA    NA    0    
# 10 F     1111  NA    NA    NA    NA    NA    1    
# 11 G     L     NA    NA    NA    NA    NA    0    
# 12 G     M     R     1114  NA    NA    NA    0  

答案2

得分: 0

这应该适用于您的示例数据,但不确定您的实际数据结构是否匹配。这里的一个重要假设是中间列没有NA。

library(tidyverse)
d1 %>%
  rowwise() %>%
  mutate(string_collection = list(c(K0, K1, K2, K3, K4, K5, K6))) %>%
  mutate(column_to_keep = length(na.omit(string_collection)) - x) %>%
  mutate(string_collection = list((string_collection)[1:column_to_keep])) %>%
  mutate(string_collection = paste0(string_collection, collapse = ",")) %>%
  dplyr::select(x, string_collection) %>%
  separate(string_collection, into = c("K0", "K1", "K2", "K3", "K4", "K5", "K6"), sep = ",") %>%
  relocate(x, .after = last_col()) %>%
  ungroup()

希望这对您有帮助。

英文:

This should work for your sample data, but not sure if the structure in your actual data would fit. The big assumption here is that there is no NA in the middle columns.

library(tidyverse)
d1 %>% 
rowwise() %>% 
mutate(string_collection = list(c(K0, K1, K2, K3, K4, K5, K6))) %>% 
mutate(column_to_keep = length(na.omit(string_collection)) - x) %>% # assume NA only found at the end
mutate(string_collection = list((string_collection)[1:column_to_keep])) %>% 
mutate(string_collection = paste0(string_collection, collapse = ",")) %>% # assume character column do not have ","
dplyr::select(x, string_collection) %>% 
separate(string_collection, into = c("K0", "K1", "K2", "K3", "K4", "K5", "K6"), sep = ",") %>% 
relocate(x, .after = last_col()) %>% # can skip the lines below
ungroup()

huangapple
  • 本文由 发表于 2023年8月11日 01:36:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76878105.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定