如何基于独占的共享值在R中选择列?

huangapple go评论89阅读模式
英文:

How to select columns in R based on exclusive shared values?

问题

我有一个包含以下8个值的数据框:0, 1, 2, 3, 4, 5, ? 和 -。从0到5的值是字符而不是数值。对于给定的一组行,我想知道它们独占共享的列值。例如,在下面的表格中,我想选择那些对于行3-5具有相同值的列,但仅当它们的值仅限于这三行时。因此,我想选择x6、x7和x8,但不包括x5(因为'2'值在行2中也出现)。我该如何在R中实现这个目标?

  1. x1 x2 x3 x4 x5 x6 x7 x8 x9
  2. 1 0 5 1 1 ? 0 5 5
  3. 2 1 ? 1 2 5 1 - 5
  4. 3 2 1 3 2 1 3 ? 4
  5. 4 3 ? 4 2 1 3 ? 4
  6. 5 4 0 1 2 1 3 ? 2

请注意,我已将原始文本中的R代码保留为原文,不进行翻译。

英文:

I have a dataframe that contains the following 8 values: 0, 1, 2, 3, 4, 5, ?, and -.
The values from 0 to 5 are characters and not numeric. For a given set of rows, I want to know what column values they share exclusively. For example, in the table below, I want to select the columns that have the same values for rows 3-5, but only when their values are exclusive to those three rows. So I want to select x6, x7, and x8, but not x5 (because the '2' value is found in row 2 as well). How do I do this in R?

  1. x1 x2 x3 x4 x5 x6 x7 x8 x9
  2. 1 0 5 1 1 ? 0 5 5
  3. 2 1 ? 1 2 5 1 - 5
  4. 3 2 1 3 2 1 3 ? 4
  5. 4 3 ? 4 2 1 3 ? 4
  6. 5 4 0 1 2 1 3 ? 2

答案1

得分: 3

以下是已翻译的部分:

这里是一个你可以尝试的函数 -

  1. select_exclusive_values <- function(x, rows) {
  2. # 选择只感兴趣的值
  3. sub_x <- x[rows]
  4. # 获取它的所有唯一值
  5. unq_sub_x <- unique(sub_x)
  6. # 检查是否所有感兴趣的值都相同,且没有其他地方出现
  7. length(unq_sub_x) == 1 && all(unq_sub_x != x[-rows])
  8. }

这检查一个列

  1. select_exclusive_values(df$x1, 3:5)
  2. #[1] FALSE

你可以使用 sapply 将其应用于数据框中的每一列。

  1. rows <- 3:5
  2. res <- sapply(df, select_exclusive_values, rows)
  3. res
  4. # x1 x2 x3 x4 x5 x6 x7 x8 x9
  5. #FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE

要选择那些列 -

  1. df[res]
  2. # x6 x7 x8
  3. #1 ? 0 5
  4. #2 5 1 -
  5. #3 1 3 ?
  6. #4 1 3 ?
  7. #5 1 3 ?

希望这对你有所帮助!

英文:

Here's a function that you can try -

  1. select_exclusive_values &lt;- function(x, rows) {
  2. #Select only interested value
  3. sub_x &lt;- x[rows]
  4. #get all unique values of it
  5. unq_sub_x &lt;- unique(sub_x)
  6. #check if all the interested values are the same and
  7. #none of them occur elsewhere
  8. length(unq_sub_x) == 1 &amp;&amp; all(unq_sub_x != x[-rows])
  9. }

This checks for one column

  1. select_exclusive_values(df$x1, 3:5)
  2. #[1] FALSE

You may use sapply to apply it for every column in the dataframe.

  1. rows &lt;- 3:5
  2. res &lt;- sapply(df, select_exclusive_values, rows)
  3. res
  4. # x1 x2 x3 x4 x5 x6 x7 x8 x9
  5. #FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE

To select those columns -

  1. df[res]
  2. # x6 x7 x8
  3. #1 ? 0 5
  4. #2 5 1 -
  5. #3 1 3 ?
  6. #4 1 3 ?
  7. #5 1 3 ?

答案2

得分: 2

使用dplyrselect函数会选择那些在所选行中所有值都等于第一个值的列,但同时第一个值又不等于任何其他行的值。

  1. library(dplyr)
  2. r = 3:5
  3. df %>%
  4. select(where(~ all(.x[r] == first(.x[r])) && all(first(.x[r]) != .x[-r])))
  5. x6 x7 x8
  6. 1 ? 0 5
  7. 2 5 1 -
  8. 3 1 3 ?
  9. 4 1 3 ?
  10. 5 1 3 ?
英文:

With dplyr, select columns where all values are equal to the first one, among the selected rows; but also the first value is different from any of the rows' value.

  1. library(dplyr)
  2. r = 3:5
  3. df %&gt;%
  4. select(where(~ all(.x[r] == first(.x[r])) &amp;&amp; all(first(.x[r]) != .x[-r])))
  5. x6 x7 x8
  6. 1 ? 0 5
  7. 2 5 1 -
  8. 3 1 3 ?
  9. 4 1 3 ?
  10. 5 1 3 ?

huangapple
  • 本文由 发表于 2023年3月3日 17:58:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75625582.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定