筛选R中具有特定字符串值的行

huangapple go评论91阅读模式
英文:

Filter row with one specific string value in R

问题

I have a dataframe in R as below:

Fruits
Apple:1
Apple:4
Bananna
Papaya
Orange, Apple:2

I want to filter rows with the string Apple as:

Apple:1
Apple:4

I tried using the dplyr package.

  1. df <- dplyr::filter(df, grepl('Apple', Fruits))

But it filters rows with the string Apple as:

Apple:1
Apple: 4
Orange, Apple:2

How to remove rows with multiple strings and filter rows with one specific string (in this case Apple)?

英文:

I have a dataframe in R as below:

  1. Fruits
  2. Apple:1
  3. Apple:4
  4. Bananna
  5. Papaya
  6. Orange, Apple:2

I want to filter rows with string Apple as

  1. Apple:1
  2. Apple:4

I tried using dplyr package.

  1. df &lt;- dplyr::filter(df, grepl(&#39;Apple&#39;, Fruits))

But it filters rows with string Apple as:

  1. Apple:1
  2. Apple: 4
  3. Orange, Apple:2

How to remove rows with multiple strings and filter rows with one specific string (in this case Apple)?

答案1

得分: 2

只过滤出 Apple,您可以使用正则锚点 ^ 指定字符串的开头,然后是 "Apple:" 和任何数字。最后,使用 $ 来指定字符串的结束,其中上述模式可能多次出现。如果字符串中有其他字符,搜索将返回 FALSE

  1. library(dplyr)
  2. df %>% filter(grepl("^(Apple:\\d+(, )?){1,}$", Fruits))
  3. Fruits
  4. 1 Apple:1
  5. 2 Apple:4
英文:

To only filter out Apple, you can use the regex anchor ^ to specify the start of a string, followed by "Apple:" and any digits. Finally close the search pattern with $, which specifies the end of a string, where the above pattern could happen more than once. The search will return FALSE if you have any other characters in between the string.

  1. library(dplyr)
  2. df %&gt;% filter(grepl(&quot;^(Apple:\\d+(, )?){1,}$&quot;, Fruits))
  3. Fruits
  4. 1 Apple:1
  5. 2 Apple:4

答案2

得分: 1

Here's the translated code part:

  1. df %>%
  2. filter(str_detect(Fruits, '^(?!.*Banana|Orange).*Apple'))

And the translated data:

  1. df <- data.frame(
  2. Fruits = c("Orange, Apple:2",
  3. "Apple, Apple:2, Apple:7",
  4. "Apple:2, Banana:10"))
英文:

EDIT:

Assuming, based on comments made by OP, that strings should be filtered where the only fruit mentioned is Apple and assuming further that the list of non-Apple fruit is manageable, you could do this:

  1. df %&gt;%
  2. filter(str_detect(Fruits, &#39;^(?!.*Banana|Orange).*Apple&#39;))
  3. Fruits
  4. 1 Apple, Apple:2, Apple:7

Here, we use negative look-ahead (?!.*Banana|Orange) to assert that Banana or Orange must not be present in the string together with Apple

Data:

  1. df &lt;- data.frame(
  2. Fruits = c(&quot;Orange, Apple:2&quot;,
  3. &quot;Apple, Apple:2, Apple:7&quot;,
  4. &quot;Apple:2, Banana:10&quot;))

huangapple
  • 本文由 发表于 2023年4月13日 20:02:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76005176.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定