使用filter在R中选择多列中的特定值。

huangapple go评论151阅读模式
英文:

Using filter to select specific values of ,multiple columns in R

问题

  1. 尝试在R中使用filter函数,并选择多列中所需的值,同时包括缺失值。
  2. ```R
  3. x <- 1:2:3:4:5:NA
  4. y <- 3:4:NA:5:6:NA
  5. z <- 2:3:4:NA:5:6
  6. df <- data.frame(x, y, z)
  7. df %>%
  8. filter(x > 4 | is.na(x), y > 4 | is.na(y), z > 4 | is.na(z))

我正在尝试筛选x、y和z列中大于4的值,同时保留NA。使用上面的尝试会导致错误:“输入必须是逻辑向量,而不是双精度数”。有什么建议可以纠正上述错误,以及如何将此命令应用于所有三列。

  1. <details>
  2. <summary>英文:</summary>
  3. Trying to use filter function in R and selecting only values needed in multiple columns and also including missing values.

x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)

df %>%
filter(x != 1, 2, 3 | is.na(x))

  1. I am trying to filter to values more than 4 in columns x, y and z keeping NA. Using the attempt above gives an error &#39;input must be logical vector, not a double&#39;. Any suggestion to rectify the above error and also how to apply this command to all three columns.
  2. </details>
  3. # 答案1
  4. **得分**: 0
  5. 这是使用dplyr的解决方案:
  6. ```r
  7. df = data.frame(
  8. x = c(1,2,3,4,5,NA),
  9. y = c(3,4,NA,5,6,NA),
  10. z = c(2,3,4,NA,5,6)
  11. )
  12. df %>%
  13. filter(
  14. x >= 4 | is.na(x),
  15. y >= 4 | is.na(y),
  16. z >= 4 | is.na(z)
  17. )
英文:

Here is a solution using dplyr:

  1. df = data.frame(
  2. x = c(1,2,3,4,5,NA),
  3. y = c(3,4,NA,5,6,NA),
  4. z = c(2,3,4,NA,5,6)
  5. )
  6. df %&gt;%
  7. filter(
  8. x &gt;= 4 | is.na(x),
  9. y &gt;= 4 | is.na(y),
  10. z &gt;= 4 | is.na(z)
  11. )

答案2

得分: 0

在基础R中:

  1. subset(df, pmin(x, y, z, na.rm = TRUE) >= 4)
  2. x y z
  3. 4 4 5 NA
  4. 5 5 6 5
  5. 6 NA NA 6

如果你有很多列而不想通过名称引用它们:

  1. subset(df, do.call(pmin, c(na.rm = TRUE, df)) >= 4)
  2. x y z
  3. 4 4 5 NA
  4. 5 5 6 5
  5. 6 NA NA 6
英文:

in base R

  1. subset(df, pmin(x, y, z, na.rm = TRUE)&gt;=4)
  2. x y z
  3. 4 4 5 NA
  4. 5 5 6 5
  5. 6 NA NA 6

in case you have very man columns and do not want to reference them by name:

  1. subset(df, do.call(pmin, c(na.rm = TRUE, df)) &gt;=4)
  2. x y z
  3. 4 4 5 NA
  4. 5 5 6 5
  5. 6 NA NA 6

答案3

得分: -1

请提供一个可复现的示例:

  1. x <- c(1:5, NA)
  2. y <- c(3:4, NA, 5:6, NA)
  3. z <- c(2:4, NA, 5:6)

然后,我建议使用 {data.table} 包:

  1. library(data.table)
  2. dt <- data.table(x, y, z)

然后,您可以像这样应用筛选条件:

  1. dt[x >= 4 | is.na(x), ]

(意思是,给我表中所有 x 大于或等于 4 或 x 为 NA 的行。)

您还可以进一步组合其他逻辑条件:

  1. dt[(x >= 4 | is.na(x)) | (y >= 4 | is.na(y)) | (z >= 4 | is.na(z)), ]

有关 {data.table} 语法的更多信息可以在此处找到:https://rdatatable.gitlab.io/data.table/

英文:

First of all, please provide a reproducible example:

  1. x &lt;- c(1:5, NA)
  2. y &lt;- c(3:4, NA, 5:6, NA)
  3. z &lt;- c(2:4, NA, 5:6)

Then I would recommend using the package {data.table}.

  1. library(data.table)
  2. dt &lt;- data.table(x, y, z)

And then you can apply filters like so

  1. dt[x &gt;= 4 | is.na(x), ]

(meaning, give me all rows of the table where x is greater or equal to 4 or where x is NA.)

You can further combine other logical constraints:

  1. dt[(x &gt;= 4 | is.na(x)) | (y &gt;= 4 | is.na(y)) | (z &gt;= 4 | is.na(z)), ]

Further information on the {data.table} syntax can be found here: https://rdatatable.gitlab.io/data.table/

huangapple
  • 本文由 发表于 2023年3月12日 17:19:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712138.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定