根据它们的属性选择列如何操作?

huangapple go评论101阅读模式
英文:

How to select columns based on their properties?

问题

以下是根据您提供的信息生成的筛选数据框的代码示例:

在基本R中:

  1. # 创建示例数据框
  2. df <- data.frame(
  3. c1 = rep(0, 7),
  4. c2 = rep(1, 7),
  5. c3 = rep(0, 7),
  6. c4 = rep(1, 7),
  7. c5 = c("?", 1, 1, 1, 1, 1, 1),
  8. c6 = c("?", 0, 0, 0, 0, 0, 0),
  9. c7 = c("?", "?", 0, 0, 0, 0, 0),
  10. c8 = c("?", "?", 1, 1, 1, 1, 1),
  11. c9 = rep(0, 7),
  12. c10 = rep(1, 7)
  13. )
  14. # 移除所有列中的 0 或 1
  15. df_filtered <- df[, !sapply(df, function(x) all(x %in% c(0, 1)))]
  16. # 移除包含 "?" 的列
  17. df_filtered <- df_filtered[, !sapply(df_filtered, function(x) any(x == "?"))]

在Tidyverse中:

  1. library(dplyr)
  2. # 创建示例数据框
  3. df <- data.frame(
  4. c1 = rep(0, 7),
  5. c2 = rep(1, 7),
  6. c3 = rep(0, 7),
  7. c4 = rep(1, 7),
  8. c5 = c("?", 1, 1, 1, 1, 1, 1),
  9. c6 = c("?", 0, 0, 0, 0, 0, 0),
  10. c7 = c("?", "?", 0, 0, 0, 0, 0),
  11. c8 = c("?", "?", 1, 1, 1, 1, 1),
  12. c9 = rep(0, 7),
  13. c10 = rep(1, 7)
  14. )
  15. # 移除所有列中的 0 或 1
  16. df_filtered <- df %>%
  17. select_if(~!all(. %in% c(0, 1)))
  18. # 移除包含 "?" 的列
  19. df_filtered <- df_filtered %>%
  20. select_if(~!any(. == "?"))

无论您选择使用基本R还是Tidyverse,上述代码将生成一个新的数据框df_filtered,其中不包含所有列都是0或1的列以及包含"?"的列。

英文:

I have a data frame with the following three values: 0, 1, and ?. The 0s and 1s are characters and not numeric data. I am trying to subset the data frame to remove the following:

  1. All columns that are uniformly 0 or 1
  2. All columns that have at least one ?

So the dataset should no invariant columns or columns with missing values.

Here is an illustration of the data frame:

  1. c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
  2. r1 0 1 0 1 ? ? ? ? 0 1
  3. r2 0 1 0 1 1 0 ? ? 0 1
  4. r3 0 1 0 1 1 0 0 1 1 0
  5. r4 0 1 0 1 1 0 0 1 1 0
  6. r5 0 1 0 1 1 0 0 1 ? 1
  7. r6 0 1 0 1 1 0 0 1 ? 0
  8. r7 0 1 1 0 1 0 0 1 0 0

So I want to exclude c1, c2, c5, c6, c7, c8, and c9. How do I do this in base R or tidyverse?

答案1

得分: 3

在tidyverse中:

  1. df %>% select_if(~!any(.x == '?') & !all(.x == 1) & !all(.x == 0))
  2. c3 c4 c10
  3. r1 0 1 1
  4. r2 0 1 1
  5. r3 0 1 0
  6. r4 0 1 0
  7. r5 0 1 1
  8. r6 0 1 0
  9. r7 1 0 0
英文:

In tidyverse:

  1. df %&gt;% select_if(~!any(.x == &#39;?&#39;) &amp; !all(.x == 1) &amp; !all(.x == 0))
  2. c3 c4 c10
  3. r1 0 1 1
  4. r2 0 1 1
  5. r3 0 1 0
  6. r4 0 1 0
  7. r5 0 1 1
  8. r6 0 1 0
  9. r7 1 0 0

答案2

得分: 2

  1. &gt; sapply(df, is.integer) &amp; colMeans(df == 0) &lt; 1 &amp; colMeans(df == 1) &lt; 1
  2. c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
  3. FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
英文:
  1. &gt; sapply(df,is.integer) &amp; colMeans(df==0)&lt;1 &amp; colMeans(df==1)&lt;1
  2. c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
  3. FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE

huangapple
  • 本文由 发表于 2023年3月9日 23:51:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75687002.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定