How can I remove rows of a dataframe that contain two specific characters?

huangapple go评论83阅读模式
英文:

How can I remove rows of a dataframe that contain two specific characters?

问题

我有一个包含7个字母的数据框中的字符列,这些字母可以是A、B、C、D。

例如

1 - AAAABBB
2 - ACBCDAB
3 - AACCADD
4 - ACDCACC
5 - ABAABBC
6 - BCBBDCB

我想要删除数据框中同时包含A和B的行,但保留包含A、C、D或B、C、D的行。

所以最终结果应该是。

3 - AACCADD
4 - ACDCACC
6 - BCBBDCB

单元格中A和B的数量并不重要,只要单元格至少包含一个A和一个B,我想要删除该行。

我尝试使用str_split_fix然后对不同的列进行子集操作,但我觉得应该有一种更有效的方法。

英文:

I have a character column of a dataframe that contains 7 letters that are either A,B,C,D.

for example

1 - AAAABBB
2 - ACBCDAB
3 - AACCADD
4 - ACDCACC
5 - ABAABBC
6 - BCBBDCB

I would like to remove rows of the dataframe that contain both an A and B,
but keep any rows that contain A's, C's, D's or B's, C's, D's

So the end result should be.

3 - AACCADD
4 - ACDCACC
6 - BCBBDCB

The number of A's and B's in a cell do not matter, as long as the cell has at least both one A and one B I would like to remove that row.

I've tried to use str_split_fix and then subset the various columns, but I feel like there should be a more efficient way.

答案1

得分: 0

使用基本的R:

subset(your_data, !(grepl("A", your_column) & grepl("B", your_column)))

或者使用tidyverse:

library(stringr)
library(dplyr)
your_data %>%
  filter(!(str_detect(your_column, "A") & str_detect(your_column, "B")))
英文:

With base R

subset(your_data, !(grepl("A", your_column) & grepl("B", your_column)))

Or with tidyverse

library(stringr)
library(dplyr)
your_data |> 
  filter(!(str_detect(your_column, "A") & str_detect(your_column, "B"))

huangapple
  • 本文由 发表于 2023年6月22日 01:24:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525773.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定