2023年2月10日 16:47:35go评论96阅读模式

英文:

How do I select columns of a dataframe with character values based on the number of times a character appears in a column?

问题

我有一个包含三个值的数据框：0、1和?。0和1的值是字符值而不是数字。我想要对数据框进行子集操作，只包括至少出现两次值"0"和至少出现两次值"1"的列。所以在下面的示例数据框中，只会选择列4和5。在R中，您可以如何实现这一点？

# 使用apply函数和table函数来统计每列中0和1的出现次数
counts <- apply(your_dataframe, 2, function(col) table(col)["0"] >= 2 & table(col)["1"] >= 2)
# 选择满足条件的列
selected_columns <- your_dataframe[, counts]
# 输出结果
selected_columns

这段代码将计算每列中"0"和"1"的出现次数，然后选择满足条件的列，最终得到符合要求的数据框。

英文:

I have a dataframe that contains three values: 0, 1, and ?. The 0 and 1 values are character values and not numeric. I want to subset the dataframe so as to include only the columns in which the value "0" occurs at least twice and the value "1" occurs at least twice. So in the example data frame below, only columns 4 and 5 would be selected. How do I do this in R?

   x1 x2 x3 x4 x5 x6 x7
1  0  0  1  1  1  1  1
2  0  ?  1  0  1  0  1
3  0  0  1  0  1  0  1
4  0  ?  1  1  0  0  1
5  0  0  1  ?  1  0  0

答案1

得分: 4

使用 select + where:

library(dplyr)
dat %>%
  select(where(~ sum(.x == "0") >= 2 & sum(.x == "1") >= 2))

基础R的替代方法:

dat[colSums(dat == "1") >= 2 & colSums(dat == "0") >= 2]

英文:

With select + where:

library(dplyr)
dat %&gt;% 
  select(where(~ sum(.x == &quot;0&quot;) &gt;= 2 &amp; sum(.x == &quot;1&quot;) &gt;= 2))

A base R alternative:

dat[colSums(dat == &quot;1&quot;) &gt;= 2 &amp; colSums(dat == &quot;0&quot;) &gt;= 2]

答案2

得分: 1

使用 data.table

library(data.table)
setDT(dt)
cols <- dt[, colSums(.SD == "0") >= 2 & colSums(.SD == "1") >= 2]
dt[, ..cols]
#    x4
# 1:  1
# 2:  0
# 3:  0
# 4:  1
# 5:  ?

英文:

using data.table

library(data.table)
setDT(dt)
cols &lt;- dt[, colSums(.SD == &quot;0&quot;) &gt;= 2 &amp; colSums(.SD == &quot;1&quot;) &gt;= 2]
dt[, ..cols]
#    x4
# 1:  1
# 2:  0
# 3:  0
# 4:  1
# 5:  ?

答案3

得分: 0

以下是代码部分的翻译，不包括问题部分：

library(dplyr)
library(purrr)
to_select <- df %>%
  purrr::map_df(~list(`0`= sum(. == "0"),
                      `1` = sum(. == "1")
                      )) %>%
  mutate(x = row_number()) %>%
  filter(if_all(c(`0`, `1`), ~ . >= 2))
df %>%
  select(pull(to_select[1,3])

英文:

Here is a way complicated approach:

library(dplyr)
library(purrr)
to_select &lt;- df %&gt;% 
  purrr::map_df(~list(`0`= sum(. == &quot;0&quot;),
                      `1` = sum(. == &quot;1&quot;)
                      )) %&gt;% 
  mutate(x = row_number()) %&gt;% 
  filter(if_all(c(`0`, `1`), ~ . &gt;=2))
df %&gt;% 
  select(pull(toselect[1,3])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据字符在列中出现的次数选择包含字符值的数据框的列？

问题

答案1

答案2

答案3

在R中如何创建一个函数，只有在达到阈值时才调用一个名称？

使用 gsub 条件替换可选组

在一个数据框中，每次使用两列的唯一值。

Calculating True Prevalence (when apparent prevalence estimates are too low or too high) – avoiding negative values in CIs or values >100%

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。