英文:
How to select columns based on their properties?
问题
以下是根据您提供的信息生成的筛选数据框的代码示例:
在基本R中:
# 创建示例数据框
df <- data.frame(
c1 = rep(0, 7),
c2 = rep(1, 7),
c3 = rep(0, 7),
c4 = rep(1, 7),
c5 = c("?", 1, 1, 1, 1, 1, 1),
c6 = c("?", 0, 0, 0, 0, 0, 0),
c7 = c("?", "?", 0, 0, 0, 0, 0),
c8 = c("?", "?", 1, 1, 1, 1, 1),
c9 = rep(0, 7),
c10 = rep(1, 7)
)
# 移除所有列中的 0 或 1
df_filtered <- df[, !sapply(df, function(x) all(x %in% c(0, 1)))]
# 移除包含 "?" 的列
df_filtered <- df_filtered[, !sapply(df_filtered, function(x) any(x == "?"))]
在Tidyverse中:
library(dplyr)
# 创建示例数据框
df <- data.frame(
c1 = rep(0, 7),
c2 = rep(1, 7),
c3 = rep(0, 7),
c4 = rep(1, 7),
c5 = c("?", 1, 1, 1, 1, 1, 1),
c6 = c("?", 0, 0, 0, 0, 0, 0),
c7 = c("?", "?", 0, 0, 0, 0, 0),
c8 = c("?", "?", 1, 1, 1, 1, 1),
c9 = rep(0, 7),
c10 = rep(1, 7)
)
# 移除所有列中的 0 或 1
df_filtered <- df %>%
select_if(~!all(. %in% c(0, 1)))
# 移除包含 "?" 的列
df_filtered <- df_filtered %>%
select_if(~!any(. == "?"))
无论您选择使用基本R还是Tidyverse,上述代码将生成一个新的数据框df_filtered
,其中不包含所有列都是0或1的列以及包含"?"的列。
英文:
I have a data frame with the following three values: 0, 1, and ?. The 0s and 1s are characters and not numeric data. I am trying to subset the data frame to remove the following:
- All columns that are uniformly 0 or 1
- All columns that have at least one ?
So the dataset should no invariant columns or columns with missing values.
Here is an illustration of the data frame:
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 0 1 0 1 ? ? ? ? 0 1
r2 0 1 0 1 1 0 ? ? 0 1
r3 0 1 0 1 1 0 0 1 1 0
r4 0 1 0 1 1 0 0 1 1 0
r5 0 1 0 1 1 0 0 1 ? 1
r6 0 1 0 1 1 0 0 1 ? 0
r7 0 1 1 0 1 0 0 1 0 0
So I want to exclude c1, c2, c5, c6, c7, c8, and c9. How do I do this in base R or tidyverse?
答案1
得分: 3
在tidyverse中:
df %>% select_if(~!any(.x == '?') & !all(.x == 1) & !all(.x == 0))
c3 c4 c10
r1 0 1 1
r2 0 1 1
r3 0 1 0
r4 0 1 0
r5 0 1 1
r6 0 1 0
r7 1 0 0
英文:
In tidyverse:
df %>% select_if(~!any(.x == '?') & !all(.x == 1) & !all(.x == 0))
c3 c4 c10
r1 0 1 1
r2 0 1 1
r3 0 1 0
r4 0 1 0
r5 0 1 1
r6 0 1 0
r7 1 0 0
答案2
得分: 2
> sapply(df, is.integer) & colMeans(df == 0) < 1 & colMeans(df == 1) < 1
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
英文:
> sapply(df,is.integer) & colMeans(df==0)<1 & colMeans(df==1)<1
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论