英文:
Removing rows based on the values in multiple columns in R
问题
在我的数据中,如果三列(V1、V2、V3)中的值是12和NA的组合(如第2行)或这三个值都等于12(如第5行),我需要删除这些行。请注意,如果所有的值都等于NA(如第3行),则应保留在数据中。
以下是你期望的结果:
V1 V2 V3 V4 V5
1 NA 55 21 NA NA
3 NA NA NA NA 18
4 12 14 NA NA NA
感谢您的提前帮助。
英文:
In my data, I need to remove the rows if the values in three columns (V1, V2, V3) are either a combination of 12 and NAs (like row 2) or all three of them equal 12 (like row 5). Please note that if all values equal NA (like row 3) it should remain in the data.
df <- data.frame(
"V1" = c(NA, NA, NA, 12, 12),
"V2" = c(55, NA, NA, 14, 12),
"V3" = c(21, 12, NA, NA, 12),
"V4" = c(NA, 32, NA, NA, NA),
"V5" = c(NA, NA, 18, NA, NA)
)
V1 V2 V3 V4 V5
1 NA 55 21 NA NA
2 NA NA 12 32 NA
3 NA NA NA NA 18
4 12 14 NA NA NA
5 12 12 12 NA NA
I would like the following result:
V1 V2 V3 V4 V5
1 NA 55 21 NA NA
3 NA NA NA NA 18
4 12 14 NA NA NA
Thanks in advance for your help.
答案1
得分: 3
你可以在 filter()
中使用双重条件:
library(dplyr)
df %>%
filter(!if_all(V1:V3, ~ .x %in% c(12, NA)) | if_all(V1:V3, ~ is.na(.x)))
# V1 V2 V3 V4 V5
# 1 NA 55 21 NA NA
# 2 NA NA NA NA 18
# 3 12 14 NA NA NA
英文:
You can use a dual condition in filter()
:
library(dplyr)
df %>%
filter(!if_all(V1:V3, ~ .x %in% c(12, NA)) | if_all(V1:V3, ~ is.na(.x)))
# V1 V2 V3 V4 V5
# 1 NA 55 21 NA NA
# 2 NA NA NA NA 18
# 3 12 14 NA NA NA
答案2
得分: 1
以下是代码的翻译部分:
col <- c("V1", "V2", "V3")
df[apply(df[, col], 1, \(x) sum((is.na(x) | x == 12), na.rm = T) != length(col)), ]
或者
df[rowSums(is.na(df[, col]) | df[, col] == 12, na.rm = TRUE) < length(col), ]
更新: 要删除包含既有 12
又有 NA
或所有值都等于 12
的行,请使用以下代码:
df[apply(df[, col], 1, \(x) !((sum((is.na(x) | x == 12), na.rm = T) == length(col)) &
(sum(is.na(x)) >= 1 & sum(x == 12, na.rm = T) >= 1) |
sum(x == 12, na.rm = T) == length(col))), ]
输出
V1 V2 V3 V4 V5
1 NA 55 21 NA NA
3 76 NA NA NA 12
4 12 14 NA NA NA
英文:
First set a col
variable storing the target column names. The total number of records being NA
or 12
should match the length
of col
.
col <- c("V1", "V2", "V3")
df[apply(df[, col], 1, \(x) sum((is.na(x) | x == 12), na.rm = T) != length(col)), ]
Or
df[rowSums(is.na(df[, col]) | df[, col] == 12, na.rm = TRUE) < length(col), ]
<hr>
Update: To remove rows that either include both 12
and NA
or all of the values equal 12
, use the following code:
df[apply(df[, col], 1, \(x) !((sum((is.na(x) | x == 12), na.rm = T) == length(col)) &
(sum(is.na(x)) >= 1 & sum(x == 12, na.rm = T) >= 1) |
sum(x == 12, na.rm = T) == length(col))), ]
Output
V1 V2 V3 V4 V5
1 NA 55 21 NA NA
3 76 NA NA NA 12
4 12 14 NA NA NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论