2023年3月3日 17:58:12go评论89阅读模式

英文:

How to select columns in R based on exclusive shared values?

问题

我有一个包含以下8个值的数据框：0, 1, 2, 3, 4, 5, ? 和 -。从0到5的值是字符而不是数值。对于给定的一组行，我想知道它们独占共享的列值。例如，在下面的表格中，我想选择那些对于行3-5具有相同值的列，但仅当它们的值仅限于这三行时。因此，我想选择x6、x7和x8，但不包括x5（因为'2'值在行2中也出现）。我该如何在R中实现这个目标？

 x1 x2 x3 x4 x5 x6 x7 x8 x9
 1  0  5  1  1  ?  0  5  5
 2  1  ?  1  2  5  1  -  5
 3  2  1  3  2  1  3  ?  4
 4  3  ?  4  2  1  3  ?  4
 5  4  0  1  2  1  3  ?  2

请注意，我已将原始文本中的R代码保留为原文，不进行翻译。

英文:

I have a dataframe that contains the following 8 values: 0, 1, 2, 3, 4, 5, ?, and -.
The values from 0 to 5 are characters and not numeric. For a given set of rows, I want to know what column values they share exclusively. For example, in the table below, I want to select the columns that have the same values for rows 3-5, but only when their values are exclusive to those three rows. So I want to select x6, x7, and x8, but not x5 (because the '2' value is found in row 2 as well). How do I do this in R?

 x1 x2 x3 x4 x5 x6 x7 x8 x9
 1  0  5  1  1  ?  0  5  5
 2  1  ?  1  2  5  1  -  5
 3  2  1  3  2  1  3  ?  4
 4  3  ?  4  2  1  3  ?  4
 5  4  0  1  2  1  3  ?  2

答案1

得分: 3

以下是已翻译的部分：

这里是一个你可以尝试的函数 -

select_exclusive_values <- function(x, rows) {
  # 选择只感兴趣的值
  sub_x <- x[rows]
  # 获取它的所有唯一值
  unq_sub_x <- unique(sub_x)
  # 检查是否所有感兴趣的值都相同，且没有其他地方出现
  length(unq_sub_x) == 1 && all(unq_sub_x != x[-rows])
}

这检查一个列

select_exclusive_values(df$x1, 3:5)
#[1] FALSE

你可以使用 sapply 将其应用于数据框中的每一列。

rows <- 3:5
res <- sapply(df, select_exclusive_values, rows)
res
#   x1    x2    x3    x4    x5    x6    x7    x8    x9 
#FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

要选择那些列 -

df[res]
#  x6 x7 x8
#1  ?  0  5
#2  5  1  -
#3  1  3  ?
#4  1  3  ?
#5  1  3  ?

希望这对你有所帮助！

英文:

Here's a function that you can try -

select_exclusive_values &lt;- function(x, rows) {
  #Select only interested value
  sub_x &lt;- x[rows]
  #get all unique values of it
  unq_sub_x &lt;- unique(sub_x)
  #check if all the interested values are the same and 
  #none of them occur elsewhere
  length(unq_sub_x) == 1 &amp;&amp; all(unq_sub_x != x[-rows])
}

This checks for one column

select_exclusive_values(df$x1, 3:5)
#[1] FALSE

You may use sapply to apply it for every column in the dataframe.

rows &lt;- 3:5
res &lt;- sapply(df, select_exclusive_values, rows)
res
#   x1    x2    x3    x4    x5    x6    x7    x8    x9 
#FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

To select those columns -

df[res]
#  x6 x7 x8
#1  ?  0  5
#2  5  1  -
#3  1  3  ?
#4  1  3  ?
#5  1  3  ?

答案2

得分: 2

使用dplyr，select函数会选择那些在所选行中所有值都等于第一个值的列，但同时第一个值又不等于任何其他行的值。

library(dplyr)
r = 3:5
df %>%
  select(where(~ all(.x[r] == first(.x[r])) && all(first(.x[r]) != .x[-r])))
  
  x6 x7 x8
1  ?  0  5
2  5  1  -
3  1  3  ?
4  1  3  ?
5  1  3  ?

英文:

With dplyr, select columns where all values are equal to the first one, among the selected rows; but also the first value is different from any of the rows' value.

library(dplyr)
r = 3:5
df %&gt;% 
  select(where(~ all(.x[r] == first(.x[r])) &amp;&amp; all(first(.x[r]) != .x[-r])))
  
  x6 x7 x8
1  ?  0  5
2  5  1  -
3  1  3  ?
4  1  3  ?
5  1  3  ?

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何基于独占的共享值在R中选择列？

问题

答案1

答案2

Replace multiple columns in a dataframe with a new column that indicates if the original columns contained any non-missing data

分层环形图以在R中更好地区分子群。

如何在R中按年重新排列我的数据框，同时带有条件和计算？

从栅格砖文件中提取所有单独的图层。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。