在R中基于多列数值删除行。

huangapple go评论67阅读模式
英文:

Removing rows based on the values in multiple columns in R

问题

在我的数据中,如果三列(V1、V2、V3)中的值是12和NA的组合(如第2行)或这三个值都等于12(如第5行),我需要删除这些行。请注意,如果所有的值都等于NA(如第3行),则应保留在数据中。

以下是你期望的结果:

     V1 V2 V3 V4 V5
1    NA 55 21 NA NA
3    NA NA NA NA 18
4    12 14 NA NA NA

感谢您的提前帮助。

英文:

In my data, I need to remove the rows if the values in three columns (V1, V2, V3) are either a combination of 12 and NAs (like row 2) or all three of them equal 12 (like row 5). Please note that if all values equal NA (like row 3) it should remain in the data.

df <- data.frame(
  "V1" = c(NA, NA, NA, 12, 12),
  "V2" = c(55, NA, NA, 14, 12),
  "V3" = c(21, 12, NA, NA, 12),
  "V4" = c(NA, 32, NA, NA, NA),
  "V5" = c(NA, NA, 18, NA, NA)
)
     V1 V2 V3 V4 V5 
1    NA 55 21 NA NA
2    NA NA 12 32 NA
3    NA NA NA NA 18
4    12 14 NA NA NA
5    12 12 12 NA NA

I would like the following result:

     V1 V2 V3 V4 V5 
1    NA 55 21 NA NA
3    NA NA NA NA 18
4    12 14 NA NA NA

Thanks in advance for your help.

答案1

得分: 3

你可以在 filter() 中使用双重条件:

library(dplyr)

df %>%
  filter(!if_all(V1:V3, ~ .x %in% c(12, NA)) | if_all(V1:V3, ~ is.na(.x)))

#   V1 V2 V3 V4 V5
# 1 NA 55 21 NA NA
# 2 NA NA NA NA 18
# 3 12 14 NA NA NA
英文:

You can use a dual condition in filter():

library(dplyr)

df %>%
  filter(!if_all(V1:V3, ~ .x %in% c(12, NA)) | if_all(V1:V3, ~ is.na(.x)))

#   V1 V2 V3 V4 V5
# 1 NA 55 21 NA NA
# 2 NA NA NA NA 18
# 3 12 14 NA NA NA

答案2

得分: 1

以下是代码的翻译部分:

col <- c("V1", "V2", "V3")

df[apply(df[, col], 1, \(x) sum((is.na(x) | x == 12), na.rm = T) != length(col)), ]

或者

df[rowSums(is.na(df[, col]) | df[, col] == 12, na.rm = TRUE) < length(col), ]

更新: 要删除包含既有 12 又有 NA 或所有值都等于 12 的行,请使用以下代码:

df[apply(df[, col], 1, \(x) !((sum((is.na(x) | x == 12), na.rm = T) == length(col)) &amp; 
                               (sum(is.na(x)) &gt;= 1 &amp; sum(x == 12, na.rm = T) &gt;= 1) |
                                sum(x == 12, na.rm = T) == length(col))), ]

输出

  V1 V2 V3 V4 V5
1 NA 55 21 NA NA
3 76 NA NA NA 12
4 12 14 NA NA NA
英文:

First set a col variable storing the target column names. The total number of records being NA or 12 should match the length of col.

col &lt;- c(&quot;V1&quot;, &quot;V2&quot;, &quot;V3&quot;)

df[apply(df[, col], 1, \(x) sum((is.na(x) | x == 12), na.rm = T) != length(col)), ]

Or

df[rowSums(is.na(df[, col]) | df[, col] == 12, na.rm = TRUE) &lt; length(col), ]

<hr>

Update: To remove rows that either include both 12 and NA or all of the values equal 12, use the following code:

df[apply(df[, col], 1, \(x) !((sum((is.na(x) | x == 12), na.rm = T) == length(col)) &amp; 
                               (sum(is.na(x)) &gt;= 1 &amp; sum(x == 12, na.rm = T) &gt;= 1) |
                                sum(x == 12, na.rm = T) == length(col))), ]

Output

  V1 V2 V3 V4 V5
1 NA 55 21 NA NA
3 76 NA NA NA 12
4 12 14 NA NA NA

huangapple
  • 本文由 发表于 2023年3月3日 22:51:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628564.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定