2023年3月12日 17:19:56go评论151阅读模式

英文:

Using filter to select specific values of ,multiple columns in R

问题

尝试在R中使用filter函数，并选择多列中所需的值，同时包括缺失值。
```R
x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)
df %>%
  filter(x > 4 | is.na(x), y > 4 | is.na(y), z > 4 | is.na(z))

我正在尝试筛选x、y和z列中大于4的值，同时保留NA。使用上面的尝试会导致错误：“输入必须是逻辑向量，而不是双精度数”。有什么建议可以纠正上述错误，以及如何将此命令应用于所有三列。


<details>
<summary>英文:</summary>
Trying to use filter function in R and selecting only values needed in multiple columns and also including missing values.

x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)

df %>%
filter(x != 1, 2, 3 | is.na(x))

I am trying to filter to values more than 4 in columns x, y and z keeping NA. Using the attempt above gives an error &#39;input must be logical vector, not a double&#39;. Any suggestion to rectify the above error and also how to apply this command to all three columns.
</details>
# 答案1
**得分**: 0
这是使用dplyr的解决方案：
```r
df = data.frame(
  x = c(1,2,3,4,5,NA),
  y = c(3,4,NA,5,6,NA),
  z = c(2,3,4,NA,5,6)
)
df %>%
  filter(
    x >= 4 | is.na(x),
    y >= 4 | is.na(y),
    z >= 4 | is.na(z)
  )

英文:

Here is a solution using dplyr:

df = data.frame(
  x = c(1,2,3,4,5,NA),
  y = c(3,4,NA,5,6,NA),
  z = c(2,3,4,NA,5,6)
)
df %&gt;%
  filter(
    x &gt;= 4 | is.na(x),
    y &gt;= 4 | is.na(y),
    z &gt;= 4 | is.na(z)
  )

答案2

得分: 0

在基础R中：

subset(df, pmin(x, y, z, na.rm = TRUE) >= 4)
   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6

如果你有很多列而不想通过名称引用它们：

subset(df, do.call(pmin, c(na.rm = TRUE, df)) >= 4)
   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6

英文:

in base R

subset(df, pmin(x, y, z, na.rm = TRUE)&gt;=4)
   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6

in case you have very man columns and do not want to reference them by name:

subset(df, do.call(pmin, c(na.rm = TRUE, df)) &gt;=4)
   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6

答案3

得分: -1

请提供一个可复现的示例：

x <- c(1:5, NA)
y <- c(3:4, NA, 5:6, NA)
z <- c(2:4, NA, 5:6)

然后，我建议使用 {data.table} 包：

library(data.table)
dt <- data.table(x, y, z)

然后，您可以像这样应用筛选条件：

dt[x >= 4 | is.na(x), ]

（意思是，给我表中所有 x 大于或等于 4 或 x 为 NA 的行。）

您还可以进一步组合其他逻辑条件：

dt[(x >= 4 | is.na(x)) | (y >= 4 | is.na(y)) | (z >= 4 | is.na(z)), ]

有关 {data.table} 语法的更多信息可以在此处找到：https://rdatatable.gitlab.io/data.table/

英文:

First of all, please provide a reproducible example:

x &lt;- c(1:5, NA)
y &lt;- c(3:4, NA, 5:6, NA)
z &lt;- c(2:4, NA, 5:6)

Then I would recommend using the package {data.table}.

library(data.table)
dt &lt;- data.table(x, y, z)

And then you can apply filters like so

dt[x &gt;= 4 | is.na(x), ]

(meaning, give me all rows of the table where x is greater or equal to 4 or where x is NA.)

You can further combine other logical constraints:

dt[(x &gt;= 4 | is.na(x)) | (y &gt;= 4 | is.na(y)) | (z &gt;= 4 | is.na(z)), ]

Further information on the {data.table} syntax can be found here: https://rdatatable.gitlab.io/data.table/

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用filter在R中选择多列中的特定值。

问题

答案2

答案3

将参数1定义为1减去参数2，使用R的paradox包。

将具有多列的数据重塑为长格式。

SQL查询在Access中有效，但在R中使用RJDBC包时无效。

如何循环以下 group_by

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。