英文:
How to select columns in R based on exclusive shared values?
问题
我有一个包含以下8个值的数据框:0, 1, 2, 3, 4, 5, ? 和 -。从0到5的值是字符而不是数值。对于给定的一组行,我想知道它们独占共享的列值。例如,在下面的表格中,我想选择那些对于行3-5具有相同值的列,但仅当它们的值仅限于这三行时。因此,我想选择x6、x7和x8,但不包括x5(因为'2'值在行2中也出现)。我该如何在R中实现这个目标?
 x1 x2 x3 x4 x5 x6 x7 x8 x9
 1  0  5  1  1  ?  0  5  5
 2  1  ?  1  2  5  1  -  5
 3  2  1  3  2  1  3  ?  4
 4  3  ?  4  2  1  3  ?  4
 5  4  0  1  2  1  3  ?  2
请注意,我已将原始文本中的R代码保留为原文,不进行翻译。
英文:
I have a dataframe that contains the following 8 values: 0, 1, 2, 3, 4, 5, ?, and -.
The values from 0 to 5 are characters and not numeric. For a given set of rows, I want to know what column values they share exclusively. For example, in the table below, I want to select the columns that have the same values for rows 3-5, but only when their values are exclusive to those three rows. So I want to select x6, x7, and x8, but not x5 (because the '2' value is found in row 2 as well). How do I do this in R?
 x1 x2 x3 x4 x5 x6 x7 x8 x9
 1  0  5  1  1  ?  0  5  5
 2  1  ?  1  2  5  1  -  5
 3  2  1  3  2  1  3  ?  4
 4  3  ?  4  2  1  3  ?  4
 5  4  0  1  2  1  3  ?  2
答案1
得分: 3
以下是已翻译的部分:
这里是一个你可以尝试的函数 -
select_exclusive_values <- function(x, rows) {
  # 选择只感兴趣的值
  sub_x <- x[rows]
  # 获取它的所有唯一值
  unq_sub_x <- unique(sub_x)
  # 检查是否所有感兴趣的值都相同,且没有其他地方出现
  length(unq_sub_x) == 1 && all(unq_sub_x != x[-rows])
}
这检查一个列
select_exclusive_values(df$x1, 3:5)
#[1] FALSE
你可以使用 sapply 将其应用于数据框中的每一列。
rows <- 3:5
res <- sapply(df, select_exclusive_values, rows)
res
#   x1    x2    x3    x4    x5    x6    x7    x8    x9 
#FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE 
要选择那些列 -
df[res]
#  x6 x7 x8
#1  ?  0  5
#2  5  1  -
#3  1  3  ?
#4  1  3  ?
#5  1  3  ?
希望这对你有所帮助!
英文:
Here's a function that you can try -
select_exclusive_values <- function(x, rows) {
  #Select only interested value
  sub_x <- x[rows]
  #get all unique values of it
  unq_sub_x <- unique(sub_x)
  #check if all the interested values are the same and 
  #none of them occur elsewhere
  length(unq_sub_x) == 1 && all(unq_sub_x != x[-rows])
}
This checks for one column
select_exclusive_values(df$x1, 3:5)
#[1] FALSE
You may use sapply to apply it for every column in the dataframe.
rows <- 3:5
res <- sapply(df, select_exclusive_values, rows)
res
#   x1    x2    x3    x4    x5    x6    x7    x8    x9 
#FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE 
To select those columns -
df[res]
#  x6 x7 x8
#1  ?  0  5
#2  5  1  -
#3  1  3  ?
#4  1  3  ?
#5  1  3  ?
答案2
得分: 2
使用dplyr,select函数会选择那些在所选行中所有值都等于第一个值的列,但同时第一个值又不等于任何其他行的值。
library(dplyr)
r = 3:5
df %>%
  select(where(~ all(.x[r] == first(.x[r])) && all(first(.x[r]) != .x[-r])))
  
  x6 x7 x8
1  ?  0  5
2  5  1  -
3  1  3  ?
4  1  3  ?
5  1  3  ?
英文:
With dplyr, select columns where all values are equal to the first one, among the selected rows; but also the first value is different from any of the rows' value.
library(dplyr)
r = 3:5
df %>% 
  select(where(~ all(.x[r] == first(.x[r])) && all(first(.x[r]) != .x[-r])))
  
  x6 x7 x8
1  ?  0  5
2  5  1  -
3  1  3  ?
4  1  3  ?
5  1  3  ?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论