英文:
How do I select columns of a dataframe with character values based on the number of times a character appears in a column?
问题
我有一个包含三个值的数据框:0、1和?。0和1的值是字符值而不是数字。我想要对数据框进行子集操作,只包括至少出现两次值"0"和至少出现两次值"1"的列。所以在下面的示例数据框中,只会选择列4和5。在R中,您可以如何实现这一点?
# 使用apply函数和table函数来统计每列中0和1的出现次数
counts <- apply(your_dataframe, 2, function(col) table(col)["0"] >= 2 & table(col)["1"] >= 2)
# 选择满足条件的列
selected_columns <- your_dataframe[, counts]
# 输出结果
selected_columns
这段代码将计算每列中"0"和"1"的出现次数,然后选择满足条件的列,最终得到符合要求的数据框。
英文:
I have a dataframe that contains three values: 0, 1, and ?. The 0 and 1 values are character values and not numeric. I want to subset the dataframe so as to include only the columns in which the value "0" occurs at least twice and the value "1" occurs at least twice. So in the example data frame below, only columns 4 and 5 would be selected. How do I do this in R?
x1 x2 x3 x4 x5 x6 x7
1 0 0 1 1 1 1 1
2 0 ? 1 0 1 0 1
3 0 0 1 0 1 0 1
4 0 ? 1 1 0 0 1
5 0 0 1 ? 1 0 0
答案1
得分: 4
使用 select
+ where
:
library(dplyr)
dat %>%
select(where(~ sum(.x == "0") >= 2 & sum(.x == "1") >= 2))
基础R的替代方法:
dat[colSums(dat == "1") >= 2 & colSums(dat == "0") >= 2]
英文:
With select
+ where
:
library(dplyr)
dat %>%
select(where(~ sum(.x == "0") >= 2 & sum(.x == "1") >= 2))
A base R alternative:
dat[colSums(dat == "1") >= 2 & colSums(dat == "0") >= 2]
答案2
得分: 1
使用 data.table
library(data.table)
setDT(dt)
cols <- dt[, colSums(.SD == "0") >= 2 & colSums(.SD == "1") >= 2]
dt[, ..cols]
# x4
# 1: 1
# 2: 0
# 3: 0
# 4: 1
# 5: ?
英文:
using data.table
library(data.table)
setDT(dt)
cols <- dt[, colSums(.SD == "0") >= 2 & colSums(.SD == "1") >= 2]
dt[, ..cols]
# x4
# 1: 1
# 2: 0
# 3: 0
# 4: 1
# 5: ?
答案3
得分: 0
以下是代码部分的翻译,不包括问题部分:
library(dplyr)
library(purrr)
to_select <- df %>%
purrr::map_df(~list(`0`= sum(. == "0"),
`1` = sum(. == "1")
)) %>%
mutate(x = row_number()) %>%
filter(if_all(c(`0`, `1`), ~ . >= 2))
df %>%
select(pull(to_select[1,3])
x4
1 1
2 0
3 0
4 1
5 ?
英文:
Here is a way complicated approach:
library(dplyr)
library(purrr)
to_select <- df %>%
purrr::map_df(~list(`0`= sum(. == "0"),
`1` = sum(. == "1")
)) %>%
mutate(x = row_number()) %>%
filter(if_all(c(`0`, `1`), ~ . >=2))
df %>%
select(pull(toselect[1,3])
x4
1 1
2 0
3 0
4 1
5 ?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论