从数据框中删除符合多个条件的行

huangapple go评论90阅读模式
英文:

Remove rows from a data frame that match on multiple criteria

问题

我希望删除包含特定模式的数据帧行,并且如果可能的话,我希望使用 tidyverse 语法。

我希望删除列1包含 "cat" 且列2至4中包含以下任何单词的行:dog、fox 或 cow。对于此示例,这将从原始数据中删除行1和4。

这是一个示例数据集:

  1. df <- data.frame(col1 = c("cat", "fox", "dog", "cat", "pig"),
  2. col2 = c("lion", "tiger", "elephant", "dog", "cow"),
  3. col3 = c("bird", "cow", "sheep", "fox", "dog"),
  4. col4 = c("dog", "cat", "cat", "cow", "fox"))

我已经尝试了许多 across 变体,但一直遇到问题。这是我最新的尝试:

  1. filtered_df <- df %>%
  2. filter(!(col1 == "cat" & !any(cowfoxdog <- across(col2:col4, ~ . %in% c("cow", "fox", "dog")))))

这返回以下错误:

  1. Error in `filter()`:
  2. ! Problem while computing `..1 = !...`.
  3. Caused by error in `FUN()`:
  4. ! only defined on a data frame with all numeric variables
英文:

I wish to remove rows of my data frame that contain a specific pattern and I wish to use tidyverse syntax if possible.

I wish to remove rows where column 1 contains "cat" and where any of col2:4 contain any of the following words: dog, fox or cow. For this example that will remove rows 1 and 4 from the original data.

Here's a sample dataset:

  1. df &lt;- data.frame(col1 = c(&quot;cat&quot;, &quot;fox&quot;, &quot;dog&quot;, &quot;cat&quot;, &quot;pig&quot;),
  2. col2 = c(&quot;lion&quot;, &quot;tiger&quot;, &quot;elephant&quot;, &quot;dog&quot;, &quot;cow&quot;),
  3. col3 = c(&quot;bird&quot;, &quot;cow&quot;, &quot;sheep&quot;, &quot;fox&quot;, &quot;dog&quot;),
  4. col4 = c(&quot;dog&quot;, &quot;cat&quot;, &quot;cat&quot;, &quot;cow&quot;, &quot;fox&quot;))

I've tried a number of across variants but constantly run into issues. Here is my latest attempt:

  1. filtered_df &lt;- df %&gt;%
  2. filter(!(animal1 == &quot;cat&quot; &amp; !any(cowfoxdog &lt;- across(animal2:animal4, ~ . %in% c(&quot;cow&quot;, &quot;fox&quot;, &quot;dog&quot;)))))

This returns the following error:

  1. Error in `filter()`:
  2. ! Problem while computing `..1 = !...`.
  3. Caused by error in `FUN()`:
  4. ! only defined on a data frame with all numeric variables

答案1

得分: 5

你可以使用 if_any()。为了进行更强健的测试,我首先添加了一行,其中 col1 == "cat",但 col2:col4没有 出现 "dog""fox""cow"

英文:

You can use if_any(). For a more robust test, I first added a row where col1 == &quot;cat&quot; but &quot;dog&quot;, &quot;fox&quot;, or &quot;cow&quot; don't appear in columns 2-4.

  1. library(dplyr)
  2. df &lt;- df %&gt;%
  3. add_row(col1 = &quot;cat&quot;, col2 = &quot;sheep&quot;, col3 = &quot;lion&quot;, col4 = &quot;tiger&quot;)
  4. df %&gt;%
  5. filter(!(col1 == &quot;cat&quot; &amp; if_any(col2:col4, \(x) x %in% c(&quot;dog&quot;, &quot;fox&quot;, &quot;cow&quot;))))
  1. col1 col2 col3 col4
  2. 1 fox tiger cow cat
  3. 2 dog elephant sheep cat
  4. 3 pig cow dog fox
  5. 4 cat sheep lion tiger

答案2

得分: 1

使用**filter()**函数根据逻辑运算符过滤符合您的条件的行:

  1. library(tidyverse)
  2. pattern1 <- c("cat")
  3. pattern2 <- c("dog", "fox", "cow")
  4. df %>%
  5. filter(!(col1 == pattern1 &
  6. (col2 %in% pattern2 |
  7. col3 %in% pattern2 |
  8. col4 %in% pattern2))
  9. )
  1. col1 col2 col3 col4
  2. 1 fox tiger cow cat
  3. 2 dog elephant sheep cat
  4. 3 pig cow dog fox
英文:

One way is to use filter() function that filters rows that meet your criteria based on logical operators:

  1. library(tidyverse)
  2. pattern1&lt;-c(&quot;cat&quot;)
  3. pattern2&lt;-c(&quot;dog&quot;, &quot;fox&quot;, &quot;cow&quot;)
  4. df %&gt;%
  5. filter(!(col1 == pattern1 &amp;
  6. (col2 %in% pattern2 |
  7. col3 %in% pattern2 |
  8. col4 %in% pattern2))
  9. )
  10. col1 col2 col3 col4
  11. 1 fox tiger cow cat
  12. 2 dog elephant sheep cat
  13. 3 pig cow dog fox

huangapple
  • 本文由 发表于 2023年2月24日 03:56:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549724.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定