英文:
Using filter to select specific values of ,multiple columns in R
问题
尝试在R中使用filter函数,并选择多列中所需的值,同时包括缺失值。
```R
x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)
df %>%
filter(x > 4 | is.na(x), y > 4 | is.na(y), z > 4 | is.na(z))
我正在尝试筛选x、y和z列中大于4的值,同时保留NA。使用上面的尝试会导致错误:“输入必须是逻辑向量,而不是双精度数”。有什么建议可以纠正上述错误,以及如何将此命令应用于所有三列。
<details>
<summary>英文:</summary>
Trying to use filter function in R and selecting only values needed in multiple columns and also including missing values.
x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)
df %>%
filter(x != 1, 2, 3 | is.na(x))
I am trying to filter to values more than 4 in columns x, y and z keeping NA. Using the attempt above gives an error 'input must be logical vector, not a double'. Any suggestion to rectify the above error and also how to apply this command to all three columns.
</details>
# 答案1
**得分**: 0
这是使用dplyr的解决方案:
```r
df = data.frame(
x = c(1,2,3,4,5,NA),
y = c(3,4,NA,5,6,NA),
z = c(2,3,4,NA,5,6)
)
df %>%
filter(
x >= 4 | is.na(x),
y >= 4 | is.na(y),
z >= 4 | is.na(z)
)
英文:
Here is a solution using dplyr:
df = data.frame(
x = c(1,2,3,4,5,NA),
y = c(3,4,NA,5,6,NA),
z = c(2,3,4,NA,5,6)
)
df %>%
filter(
x >= 4 | is.na(x),
y >= 4 | is.na(y),
z >= 4 | is.na(z)
)
答案2
得分: 0
在基础R中:
subset(df, pmin(x, y, z, na.rm = TRUE) >= 4)
x y z
4 4 5 NA
5 5 6 5
6 NA NA 6
如果你有很多列而不想通过名称引用它们:
subset(df, do.call(pmin, c(na.rm = TRUE, df)) >= 4)
x y z
4 4 5 NA
5 5 6 5
6 NA NA 6
英文:
in base R
subset(df, pmin(x, y, z, na.rm = TRUE)>=4)
x y z
4 4 5 NA
5 5 6 5
6 NA NA 6
in case you have very man columns and do not want to reference them by name:
subset(df, do.call(pmin, c(na.rm = TRUE, df)) >=4)
x y z
4 4 5 NA
5 5 6 5
6 NA NA 6
答案3
得分: -1
请提供一个可复现的示例:
x <- c(1:5, NA)
y <- c(3:4, NA, 5:6, NA)
z <- c(2:4, NA, 5:6)
然后,我建议使用 {data.table}
包:
library(data.table)
dt <- data.table(x, y, z)
然后,您可以像这样应用筛选条件:
dt[x >= 4 | is.na(x), ]
(意思是,给我表中所有 x 大于或等于 4 或 x 为 NA 的行。)
您还可以进一步组合其他逻辑条件:
dt[(x >= 4 | is.na(x)) | (y >= 4 | is.na(y)) | (z >= 4 | is.na(z)), ]
有关 {data.table}
语法的更多信息可以在此处找到:https://rdatatable.gitlab.io/data.table/
英文:
First of all, please provide a reproducible example:
x <- c(1:5, NA)
y <- c(3:4, NA, 5:6, NA)
z <- c(2:4, NA, 5:6)
Then I would recommend using the package {data.table}
.
library(data.table)
dt <- data.table(x, y, z)
And then you can apply filters like so
dt[x >= 4 | is.na(x), ]
(meaning, give me all rows of the table where x is greater or equal to 4 or where x is NA.)
You can further combine other logical constraints:
dt[(x >= 4 | is.na(x)) | (y >= 4 | is.na(y)) | (z >= 4 | is.na(z)), ]
Further information on the {data.table}
syntax can be found here: https://rdatatable.gitlab.io/data.table/
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论