英文:
Remove all rows containing certain strings in R
问题
我想要从数据框中删除包含特定字符串的所有行。这些字符串 - 将它们称为"abc1","abc2","abc3"等等 - 出现在数据集的不同行的不同列下。例如,"abc1"可能出现在第一列的第15行,然后在第二列的第20行出现。我想要删除包含任何这些字符串的所有行。我查看的解决方案是基于单个变量包含所讨论的字符串 - 当字符串出现在多个变量下时,我应该如何高效地执行此操作?
英文:
I want to remove all rows from a data frame that contain certain strings. The strings - call them "abc1", "abc2", "abc3" and so forth - appear under different columns at different rows in the dataset. For example, "abc1" may appear in the first column at row 15, and then appear in the second column at row 20. I want to delete all rows that contain any of these strings. The solutions I looked on were based on a single variable containing the strings in question - how do I do this efficiently when the strings appear under more than one variable?
答案1
得分: 2
以下是翻译好的部分:
使用 filter
与 if_any
可以循环遍历字符类列,检查元素是否包含 "abc" 后跟任何数字,并使用 str_detect
进行检查,使用取反 (!
) 来返回不包含这些元素的行。
library(dplyr)
library(stringr)
df1 %>%
filter(!if_any(where(is.character), ~ str_detect(.x, "^abc\\d+")))
-输出
col1 col2 col3
1 ac2 3 5d
2 4d 4 3c
或者使用 base R
subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
grepl, pattern = "^abc\\d+")))
col1 col2 col3
3 ac2 3 5d
4 4d 4 3c
或者也可以使用
subset(df1, !grepl("abc\\d+", do.call(paste, df1)))
col1 col2 col3
3 ac2 3 5d
4 4d 4 3c
数据
df1 <- structure(list(col1 = c("abc1", "xyz1", "ac2", "4d"), col2 = 1:4,
col3 = c("1d", "abc3", "5d", "3c")), class = "data.frame", row.names = c(NA,
-4L))
英文:
We may use filter
with if_any
to loop over the character class columns, check whether the elements have abc followed by any digits with str_detect
, negate (!
) so that we return rows without any of those elements
library(dplyr)
library(stringr)
df1 %>%
filter(!if_any(where(is.character), ~ str_detect(.x, "^abc\\d+")))
-output
col1 col2 col3
1 ac2 3 5d
2 4d 4 3c
Or using base R
subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
grepl, pattern = "^abc\\d+")))
col1 col2 col3
3 ac2 3 5d
4 4d 4 3c
Or may also do
subset(df1, !grepl("abc\\d+", do.call(paste, df1)))
col1 col2 col3
3 ac2 3 5d
4 4d 4 3c
data
df1 <- structure(list(col1 = c("abc1", "xyz1", "ac2", "4d"), col2 = 1:4,
col3 = c("1d", "abc3", "5d", "3c")), class = "data.frame", row.names = c(NA,
-4L))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论