使用filter在R中选择多列中的特定值。

huangapple go评论114阅读模式
英文:

Using filter to select specific values of ,multiple columns in R

问题

尝试在R中使用filter函数,并选择多列中所需的值,同时包括缺失值。

```R
x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)

df %>%
  filter(x > 4 | is.na(x), y > 4 | is.na(y), z > 4 | is.na(z))

我正在尝试筛选x、y和z列中大于4的值,同时保留NA。使用上面的尝试会导致错误:“输入必须是逻辑向量,而不是双精度数”。有什么建议可以纠正上述错误,以及如何将此命令应用于所有三列。


<details>
<summary>英文:</summary>

Trying to use filter function in R and selecting only values needed in multiple columns and also including missing values.


x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)

df %>%
filter(x != 1, 2, 3 | is.na(x))

I am trying to filter to values more than 4 in columns x, y and z keeping NA. Using the attempt above gives an error &#39;input must be logical vector, not a double&#39;. Any suggestion to rectify the above error and also how to apply this command to all three columns.


</details>


# 答案1
**得分**: 0

这是使用dplyr的解决方案:
```r
df = data.frame(
  x = c(1,2,3,4,5,NA),
  y = c(3,4,NA,5,6,NA),
  z = c(2,3,4,NA,5,6)
)

df %>%
  filter(
    x >= 4 | is.na(x),
    y >= 4 | is.na(y),
    z >= 4 | is.na(z)
  )
英文:

Here is a solution using dplyr:

df = data.frame(
  x = c(1,2,3,4,5,NA),
  y = c(3,4,NA,5,6,NA),
  z = c(2,3,4,NA,5,6)
)

df %&gt;%
  filter(
    x &gt;= 4 | is.na(x),
    y &gt;= 4 | is.na(y),
    z &gt;= 4 | is.na(z)
  )

答案2

得分: 0

在基础R中:

subset(df, pmin(x, y, z, na.rm = TRUE) >= 4)
   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6

如果你有很多列而不想通过名称引用它们:

subset(df, do.call(pmin, c(na.rm = TRUE, df)) >= 4)
   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6
英文:

in base R

subset(df, pmin(x, y, z, na.rm = TRUE)&gt;=4)

   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6

in case you have very man columns and do not want to reference them by name:

subset(df, do.call(pmin, c(na.rm = TRUE, df)) &gt;=4)
   x  y  z
4  4  5 NA
5  5  6  5
6 NA NA  6

答案3

得分: -1

请提供一个可复现的示例:

x <- c(1:5, NA)
y <- c(3:4, NA, 5:6, NA)
z <- c(2:4, NA, 5:6)

然后,我建议使用 {data.table} 包:

library(data.table)
dt <- data.table(x, y, z)

然后,您可以像这样应用筛选条件:

dt[x >= 4 | is.na(x), ]

(意思是,给我表中所有 x 大于或等于 4 或 x 为 NA 的行。)

您还可以进一步组合其他逻辑条件:

dt[(x >= 4 | is.na(x)) | (y >= 4 | is.na(y)) | (z >= 4 | is.na(z)), ]

有关 {data.table} 语法的更多信息可以在此处找到:https://rdatatable.gitlab.io/data.table/

英文:

First of all, please provide a reproducible example:

x &lt;- c(1:5, NA)
y &lt;- c(3:4, NA, 5:6, NA)
z &lt;- c(2:4, NA, 5:6)

Then I would recommend using the package {data.table}.

library(data.table)
dt &lt;- data.table(x, y, z)

And then you can apply filters like so

dt[x &gt;= 4 | is.na(x), ]

(meaning, give me all rows of the table where x is greater or equal to 4 or where x is NA.)

You can further combine other logical constraints:

dt[(x &gt;= 4 | is.na(x)) | (y &gt;= 4 | is.na(y)) | (z &gt;= 4 | is.na(z)), ]

Further information on the {data.table} syntax can be found here: https://rdatatable.gitlab.io/data.table/

huangapple
  • 本文由 发表于 2023年3月12日 17:19:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712138.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定