删除包含特定字符串的所有行在 R 中

huangapple go评论62阅读模式
英文:

Remove all rows containing certain strings in R

问题

我想要从数据框中删除包含特定字符串的所有行。这些字符串 - 将它们称为"abc1","abc2","abc3"等等 - 出现在数据集的不同行的不同列下。例如,"abc1"可能出现在第一列的第15行,然后在第二列的第20行出现。我想要删除包含任何这些字符串的所有行。我查看的解决方案是基于单个变量包含所讨论的字符串 - 当字符串出现在多个变量下时,我应该如何高效地执行此操作?

英文:

I want to remove all rows from a data frame that contain certain strings. The strings - call them "abc1", "abc2", "abc3" and so forth - appear under different columns at different rows in the dataset. For example, "abc1" may appear in the first column at row 15, and then appear in the second column at row 20. I want to delete all rows that contain any of these strings. The solutions I looked on were based on a single variable containing the strings in question - how do I do this efficiently when the strings appear under more than one variable?

答案1

得分: 2

以下是翻译好的部分:

使用 filterif_any 可以循环遍历字符类列,检查元素是否包含 "abc" 后跟任何数字,并使用 str_detect 进行检查,使用取反 (!) 来返回不包含这些元素的行。

library(dplyr)
library(stringr)
df1 %>%
   filter(!if_any(where(is.character), ~ str_detect(.x, "^abc\\d+")))

-输出

   col1 col2 col3
1  ac2    3   5d
2   4d    4   3c

或者使用 base R

subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
    grepl, pattern = "^abc\\d+")))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

或者也可以使用

subset(df1, !grepl("abc\\d+", do.call(paste, df1)))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

数据

df1 <- structure(list(col1 = c("abc1", "xyz1", "ac2", "4d"), col2 = 1:4, 
    col3 = c("1d", "abc3", "5d", "3c")), class = "data.frame", row.names = c(NA, 
-4L))
英文:

We may use filter with if_any to loop over the character class columns, check whether the elements have abc followed by any digits with str_detect, negate (!) so that we return rows without any of those elements

library(dplyr)
library(stringr)
df1 %&gt;%
   filter(!if_any(where(is.character), ~ str_detect(.x, &quot;^abc\\d+&quot;)))

-output

   col1 col2 col3
1  ac2    3   5d
2   4d    4   3c

Or using base R

subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
    grepl, pattern = &quot;^abc\\d+&quot;)))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

Or may also do

subset(df1, !grepl(&quot;abc\\d+&quot;, do.call(paste, df1)))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

data

df1 &lt;- structure(list(col1 = c(&quot;abc1&quot;, &quot;xyz1&quot;, &quot;ac2&quot;, &quot;4d&quot;), col2 = 1:4, 
    col3 = c(&quot;1d&quot;, &quot;abc3&quot;, &quot;5d&quot;, &quot;3c&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, 
-4L))

huangapple
  • 本文由 发表于 2023年2月27日 09:48:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75576171.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定