2023年2月27日 09:48:42go评论82阅读模式

英文:

Remove all rows containing certain strings in R

问题

我想要从数据框中删除包含特定字符串的所有行。这些字符串 - 将它们称为"abc1"，"abc2"，"abc3"等等 - 出现在数据集的不同行的不同列下。例如，"abc1"可能出现在第一列的第15行，然后在第二列的第20行出现。我想要删除包含任何这些字符串的所有行。我查看的解决方案是基于单个变量包含所讨论的字符串 - 当字符串出现在多个变量下时，我应该如何高效地执行此操作？

英文:

I want to remove all rows from a data frame that contain certain strings. The strings - call them "abc1", "abc2", "abc3" and so forth - appear under different columns at different rows in the dataset. For example, "abc1" may appear in the first column at row 15, and then appear in the second column at row 20. I want to delete all rows that contain any of these strings. The solutions I looked on were based on a single variable containing the strings in question - how do I do this efficiently when the strings appear under more than one variable?

答案1

得分: 2

以下是翻译好的部分：

使用 filter 与 if_any 可以循环遍历字符类列，检查元素是否包含 "abc" 后跟任何数字，并使用 str_detect 进行检查，使用取反 (!) 来返回不包含这些元素的行。

library(dplyr)
library(stringr)
df1 %>%
   filter(!if_any(where(is.character), ~ str_detect(.x, "^abc\\d+")))

-输出

   col1 col2 col3
1  ac2    3   5d
2   4d    4   3c

或者使用 base R

subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
    grepl, pattern = "^abc\\d+")))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

或者也可以使用

subset(df1, !grepl("abc\\d+", do.call(paste, df1)))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

数据

df1 <- structure(list(col1 = c("abc1", "xyz1", "ac2", "4d"), col2 = 1:4, 
    col3 = c("1d", "abc3", "5d", "3c")), class = "data.frame", row.names = c(NA, 
-4L))

英文:

We may use filter with if_any to loop over the character class columns, check whether the elements have abc followed by any digits with str_detect, negate (!) so that we return rows without any of those elements

library(dplyr)
library(stringr)
df1 %&gt;%
   filter(!if_any(where(is.character), ~ str_detect(.x, &quot;^abc\\d+&quot;)))

-output

   col1 col2 col3
1  ac2    3   5d
2   4d    4   3c

Or using base R

subset(df1, !Reduce(`|`, lapply(Filter(is.character, df1),
    grepl, pattern = &quot;^abc\\d+&quot;)))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

Or may also do

subset(df1, !grepl(&quot;abc\\d+&quot;, do.call(paste, df1)))
  col1 col2 col3
3  ac2    3   5d
4   4d    4   3c

data

df1 &lt;- structure(list(col1 = c(&quot;abc1&quot;, &quot;xyz1&quot;, &quot;ac2&quot;, &quot;4d&quot;), col2 = 1:4, 
    col3 = c(&quot;1d&quot;, &quot;abc3&quot;, &quot;5d&quot;, &quot;3c&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, 
-4L))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

删除包含特定字符串的所有行在 R 中

问题

答案1

数据

data

dplyr解决方案以精确和部分字符串连接方式

使用美学属性在一个ggplot上绘制多个统计函数。

使用R进行自定义顺序的数据排序

创建线性回归模型时出现错误的循环中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论