根据它们的属性选择列如何操作?

huangapple go评论69阅读模式
英文:

How to select columns based on their properties?

问题

以下是根据您提供的信息生成的筛选数据框的代码示例:

在基本R中:

# 创建示例数据框
df <- data.frame(
  c1 = rep(0, 7),
  c2 = rep(1, 7),
  c3 = rep(0, 7),
  c4 = rep(1, 7),
  c5 = c("?", 1, 1, 1, 1, 1, 1),
  c6 = c("?", 0, 0, 0, 0, 0, 0),
  c7 = c("?", "?", 0, 0, 0, 0, 0),
  c8 = c("?", "?", 1, 1, 1, 1, 1),
  c9 = rep(0, 7),
  c10 = rep(1, 7)
)

# 移除所有列中的 0 或 1
df_filtered <- df[, !sapply(df, function(x) all(x %in% c(0, 1)))]

# 移除包含 "?" 的列
df_filtered <- df_filtered[, !sapply(df_filtered, function(x) any(x == "?"))]

在Tidyverse中:

library(dplyr)

# 创建示例数据框
df <- data.frame(
  c1 = rep(0, 7),
  c2 = rep(1, 7),
  c3 = rep(0, 7),
  c4 = rep(1, 7),
  c5 = c("?", 1, 1, 1, 1, 1, 1),
  c6 = c("?", 0, 0, 0, 0, 0, 0),
  c7 = c("?", "?", 0, 0, 0, 0, 0),
  c8 = c("?", "?", 1, 1, 1, 1, 1),
  c9 = rep(0, 7),
  c10 = rep(1, 7)
)

# 移除所有列中的 0 或 1
df_filtered <- df %>%
  select_if(~!all(. %in% c(0, 1)))

# 移除包含 "?" 的列
df_filtered <- df_filtered %>%
  select_if(~!any(. == "?"))

无论您选择使用基本R还是Tidyverse,上述代码将生成一个新的数据框df_filtered,其中不包含所有列都是0或1的列以及包含"?"的列。

英文:

I have a data frame with the following three values: 0, 1, and ?. The 0s and 1s are characters and not numeric data. I am trying to subset the data frame to remove the following:

  1. All columns that are uniformly 0 or 1
  2. All columns that have at least one ?

So the dataset should no invariant columns or columns with missing values.

Here is an illustration of the data frame:

   c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 0  1  0  1  ?  ?  ?  ?  0  1
r2 0  1  0  1  1  0  ?  ?  0  1
r3 0  1  0  1  1  0  0  1  1  0
r4 0  1  0  1  1  0  0  1  1  0
r5 0  1  0  1  1  0  0  1  ?  1
r6 0  1  0  1  1  0  0  1  ?  0
r7 0  1  1  0  1  0  0  1  0  0

So I want to exclude c1, c2, c5, c6, c7, c8, and c9. How do I do this in base R or tidyverse?

答案1

得分: 3

在tidyverse中:

df %>% select_if(~!any(.x == '?') & !all(.x == 1) & !all(.x == 0))
   c3 c4 c10
r1  0  1   1
r2  0  1   1
r3  0  1   0
r4  0  1   0
r5  0  1   1
r6  0  1   0
r7  1  0   0
英文:

In tidyverse:

df %&gt;% select_if(~!any(.x == &#39;?&#39;) &amp; !all(.x == 1) &amp; !all(.x == 0))
   c3 c4 c10
r1  0  1   1
r2  0  1   1
r3  0  1   0
r4  0  1   0
r5  0  1   1
r6  0  1   0
r7  1  0   0

答案2

得分: 2

    &gt; sapply(df, is.integer) &amp; colMeans(df == 0) &lt; 1 &amp; colMeans(df == 1) &lt; 1
       c1    c2    c3    c4    c5    c6    c7    c8    c9   c10 
    FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
英文:
&gt; sapply(df,is.integer) &amp; colMeans(df==0)&lt;1 &amp; colMeans(df==1)&lt;1
   c1    c2    c3    c4    c5    c6    c7    c8    c9   c10 
FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE

huangapple
  • 本文由 发表于 2023年3月9日 23:51:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75687002.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定