如何根据字符在列中出现的次数选择包含字符值的数据框的列?

huangapple go评论58阅读模式
英文:

How do I select columns of a dataframe with character values based on the number of times a character appears in a column?

问题

我有一个包含三个值的数据框:0、1和?。0和1的值是字符值而不是数字。我想要对数据框进行子集操作,只包括至少出现两次值"0"和至少出现两次值"1"的列。所以在下面的示例数据框中,只会选择列4和5。在R中,您可以如何实现这一点?

# 使用apply函数和table函数来统计每列中0和1的出现次数
counts <- apply(your_dataframe, 2, function(col) table(col)["0"] >= 2 & table(col)["1"] >= 2)

# 选择满足条件的列
selected_columns <- your_dataframe[, counts]

# 输出结果
selected_columns

这段代码将计算每列中"0"和"1"的出现次数,然后选择满足条件的列,最终得到符合要求的数据框。

英文:

I have a dataframe that contains three values: 0, 1, and ?. The 0 and 1 values are character values and not numeric. I want to subset the dataframe so as to include only the columns in which the value "0" occurs at least twice and the value "1" occurs at least twice. So in the example data frame below, only columns 4 and 5 would be selected. How do I do this in R?

   x1 x2 x3 x4 x5 x6 x7
1  0  0  1  1  1  1  1
2  0  ?  1  0  1  0  1
3  0  0  1  0  1  0  1
4  0  ?  1  1  0  0  1
5  0  0  1  ?  1  0  0

答案1

得分: 4

使用 select + where:

library(dplyr)
dat %>%
  select(where(~ sum(.x == "0") >= 2 & sum(.x == "1") >= 2))

基础R的替代方法:

dat[colSums(dat == "1") >= 2 & colSums(dat == "0") >= 2]
英文:

With select + where:

library(dplyr)
dat %&gt;% 
  select(where(~ sum(.x == &quot;0&quot;) &gt;= 2 &amp; sum(.x == &quot;1&quot;) &gt;= 2))

A base R alternative:

dat[colSums(dat == &quot;1&quot;) &gt;= 2 &amp; colSums(dat == &quot;0&quot;) &gt;= 2]

答案2

得分: 1

使用 data.table

library(data.table)
setDT(dt)

cols <- dt[, colSums(.SD == "0") >= 2 & colSums(.SD == "1") >= 2]
dt[, ..cols]

#    x4
# 1:  1
# 2:  0
# 3:  0
# 4:  1
# 5:  ?
英文:

using data.table

library(data.table)
setDT(dt)

cols &lt;- dt[, colSums(.SD == &quot;0&quot;) &gt;= 2 &amp; colSums(.SD == &quot;1&quot;) &gt;= 2]
dt[, ..cols]

#    x4
# 1:  1
# 2:  0
# 3:  0
# 4:  1
# 5:  ?

答案3

得分: 0

以下是代码部分的翻译,不包括问题部分:

library(dplyr)
library(purrr)
to_select <- df %>%
  purrr::map_df(~list(`0`= sum(. == "0"),
                      `1` = sum(. == "1")
                      )) %>%
  mutate(x = row_number()) %>%
  filter(if_all(c(`0`, `1`), ~ . >= 2))

df %>%
  select(pull(to_select[1,3])
 x4
1  1
2  0
3  0
4  1
5  ?
英文:

Here is a way complicated approach:

library(dplyr)
library(purrr)
to_select &lt;- df %&gt;% 
  purrr::map_df(~list(`0`= sum(. == &quot;0&quot;),
                      `1` = sum(. == &quot;1&quot;)
                      )) %&gt;% 
  mutate(x = row_number()) %&gt;% 
  filter(if_all(c(`0`, `1`), ~ . &gt;=2))

df %&gt;% 
  select(pull(toselect[1,3])
 x4
1  1
2  0
3  0
4  1
5  ?

huangapple
  • 本文由 发表于 2023年2月10日 16:47:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75408759.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定