从数据框中删除符合多个条件的行

huangapple go评论61阅读模式
英文:

Remove rows from a data frame that match on multiple criteria

问题

我希望删除包含特定模式的数据帧行,并且如果可能的话,我希望使用 tidyverse 语法。

我希望删除列1包含 "cat" 且列2至4中包含以下任何单词的行:dog、fox 或 cow。对于此示例,这将从原始数据中删除行1和4。

这是一个示例数据集:

df <- data.frame(col1 = c("cat", "fox", "dog", "cat", "pig"),
                 col2 = c("lion", "tiger", "elephant", "dog", "cow"),
                 col3 = c("bird", "cow", "sheep", "fox", "dog"),
                 col4 = c("dog", "cat", "cat", "cow", "fox"))

我已经尝试了许多 across 变体,但一直遇到问题。这是我最新的尝试:

filtered_df <- df %>%
  filter(!(col1 == "cat" & !any(cowfoxdog <- across(col2:col4, ~ . %in% c("cow", "fox", "dog")))))

这返回以下错误:

Error in `filter()`:
! Problem while computing `..1 = !...`.
Caused by error in `FUN()`:
! only defined on a data frame with all numeric variables
英文:

I wish to remove rows of my data frame that contain a specific pattern and I wish to use tidyverse syntax if possible.

I wish to remove rows where column 1 contains "cat" and where any of col2:4 contain any of the following words: dog, fox or cow. For this example that will remove rows 1 and 4 from the original data.

Here's a sample dataset:

df &lt;- data.frame(col1 = c(&quot;cat&quot;, &quot;fox&quot;, &quot;dog&quot;, &quot;cat&quot;, &quot;pig&quot;),
                 col2 = c(&quot;lion&quot;, &quot;tiger&quot;, &quot;elephant&quot;, &quot;dog&quot;, &quot;cow&quot;),
                 col3 = c(&quot;bird&quot;, &quot;cow&quot;, &quot;sheep&quot;, &quot;fox&quot;, &quot;dog&quot;),
                 col4 = c(&quot;dog&quot;, &quot;cat&quot;, &quot;cat&quot;, &quot;cow&quot;, &quot;fox&quot;))

I've tried a number of across variants but constantly run into issues. Here is my latest attempt:

filtered_df &lt;- df %&gt;%
  filter(!(animal1 == &quot;cat&quot; &amp; !any(cowfoxdog &lt;- across(animal2:animal4, ~ . %in% c(&quot;cow&quot;, &quot;fox&quot;, &quot;dog&quot;)))))

This returns the following error:

Error in `filter()`:
! Problem while computing `..1 = !...`.
Caused by error in `FUN()`:
! only defined on a data frame with all numeric variables

答案1

得分: 5

你可以使用 if_any()。为了进行更强健的测试,我首先添加了一行,其中 col1 == "cat",但 col2:col4没有 出现 "dog""fox""cow"

英文:

You can use if_any(). For a more robust test, I first added a row where col1 == &quot;cat&quot; but &quot;dog&quot;, &quot;fox&quot;, or &quot;cow&quot; don't appear in columns 2-4.

library(dplyr)

df &lt;- df %&gt;% 
  add_row(col1 = &quot;cat&quot;, col2 = &quot;sheep&quot;, col3 = &quot;lion&quot;, col4 = &quot;tiger&quot;)

df %&gt;% 
  filter(!(col1 == &quot;cat&quot; &amp; if_any(col2:col4, \(x) x %in% c(&quot;dog&quot;, &quot;fox&quot;, &quot;cow&quot;))))
  col1     col2  col3  col4
1  fox    tiger   cow   cat
2  dog elephant sheep   cat
3  pig      cow   dog   fox
4  cat    sheep  lion tiger

答案2

得分: 1

使用**filter()**函数根据逻辑运算符过滤符合您的条件的行:

library(tidyverse)

pattern1 <- c("cat")
pattern2 <- c("dog", "fox", "cow")

df %>%
  filter(!(col1 == pattern1 &
             (col2 %in% pattern2 |
              col3 %in% pattern2 |
              col4 %in% pattern2))
         )
   col1     col2  col3 col4
1  fox    tiger   cow  cat
2  dog elephant sheep  cat
3  pig      cow   dog  fox
英文:

One way is to use filter() function that filters rows that meet your criteria based on logical operators:

library(tidyverse)

pattern1&lt;-c(&quot;cat&quot;)
pattern2&lt;-c(&quot;dog&quot;, &quot;fox&quot;, &quot;cow&quot;)

df %&gt;% 
  filter(!(col1 == pattern1 &amp; 
             (col2 %in% pattern2 | 
              col3 %in% pattern2 | 
              col4 %in% pattern2))
         )


  col1     col2  col3 col4
1  fox    tiger   cow  cat
2  dog elephant sheep  cat
3  pig      cow   dog  fox

huangapple
  • 本文由 发表于 2023年2月24日 03:56:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549724.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定