如何基于独占的共享值在R中选择列?

huangapple go评论61阅读模式
英文:

How to select columns in R based on exclusive shared values?

问题

我有一个包含以下8个值的数据框:0, 1, 2, 3, 4, 5, ? 和 -。从0到5的值是字符而不是数值。对于给定的一组行,我想知道它们独占共享的列值。例如,在下面的表格中,我想选择那些对于行3-5具有相同值的列,但仅当它们的值仅限于这三行时。因此,我想选择x6、x7和x8,但不包括x5(因为'2'值在行2中也出现)。我该如何在R中实现这个目标?

 x1 x2 x3 x4 x5 x6 x7 x8 x9
 1  0  5  1  1  ?  0  5  5
 2  1  ?  1  2  5  1  -  5
 3  2  1  3  2  1  3  ?  4
 4  3  ?  4  2  1  3  ?  4
 5  4  0  1  2  1  3  ?  2

请注意,我已将原始文本中的R代码保留为原文,不进行翻译。

英文:

I have a dataframe that contains the following 8 values: 0, 1, 2, 3, 4, 5, ?, and -.
The values from 0 to 5 are characters and not numeric. For a given set of rows, I want to know what column values they share exclusively. For example, in the table below, I want to select the columns that have the same values for rows 3-5, but only when their values are exclusive to those three rows. So I want to select x6, x7, and x8, but not x5 (because the '2' value is found in row 2 as well). How do I do this in R?

 x1 x2 x3 x4 x5 x6 x7 x8 x9
 1  0  5  1  1  ?  0  5  5
 2  1  ?  1  2  5  1  -  5
 3  2  1  3  2  1  3  ?  4
 4  3  ?  4  2  1  3  ?  4
 5  4  0  1  2  1  3  ?  2

答案1

得分: 3

以下是已翻译的部分:

这里是一个你可以尝试的函数 -

select_exclusive_values <- function(x, rows) {
  # 选择只感兴趣的值
  sub_x <- x[rows]
  # 获取它的所有唯一值
  unq_sub_x <- unique(sub_x)
  # 检查是否所有感兴趣的值都相同,且没有其他地方出现
  length(unq_sub_x) == 1 && all(unq_sub_x != x[-rows])
}

这检查一个列

select_exclusive_values(df$x1, 3:5)
#[1] FALSE

你可以使用 sapply 将其应用于数据框中的每一列。

rows <- 3:5
res <- sapply(df, select_exclusive_values, rows)

res
#   x1    x2    x3    x4    x5    x6    x7    x8    x9 
#FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE 

要选择那些列 -

df[res]
#  x6 x7 x8
#1  ?  0  5
#2  5  1  -
#3  1  3  ?
#4  1  3  ?
#5  1  3  ?

希望这对你有所帮助!

英文:

Here's a function that you can try -

select_exclusive_values &lt;- function(x, rows) {
  #Select only interested value
  sub_x &lt;- x[rows]
  #get all unique values of it
  unq_sub_x &lt;- unique(sub_x)
  #check if all the interested values are the same and 
  #none of them occur elsewhere
  length(unq_sub_x) == 1 &amp;&amp; all(unq_sub_x != x[-rows])
}

This checks for one column

select_exclusive_values(df$x1, 3:5)
#[1] FALSE

You may use sapply to apply it for every column in the dataframe.

rows &lt;- 3:5
res &lt;- sapply(df, select_exclusive_values, rows)

res
#   x1    x2    x3    x4    x5    x6    x7    x8    x9 
#FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE 

To select those columns -

df[res]
#  x6 x7 x8
#1  ?  0  5
#2  5  1  -
#3  1  3  ?
#4  1  3  ?
#5  1  3  ?

答案2

得分: 2

使用dplyrselect函数会选择那些在所选行中所有值都等于第一个值的列,但同时第一个值又不等于任何其他行的值。

library(dplyr)
r = 3:5
df %>%
  select(where(~ all(.x[r] == first(.x[r])) && all(first(.x[r]) != .x[-r])))
  
  x6 x7 x8
1  ?  0  5
2  5  1  -
3  1  3  ?
4  1  3  ?
5  1  3  ?
英文:

With dplyr, select columns where all values are equal to the first one, among the selected rows; but also the first value is different from any of the rows' value.

library(dplyr)
r = 3:5
df %&gt;% 
  select(where(~ all(.x[r] == first(.x[r])) &amp;&amp; all(first(.x[r]) != .x[-r])))
  
  x6 x7 x8
1  ?  0  5
2  5  1  -
3  1  3  ?
4  1  3  ?
5  1  3  ?

huangapple
  • 本文由 发表于 2023年3月3日 17:58:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75625582.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定