英文:
How to select columns in R based on exclusive shared values?
问题
我有一个包含以下8个值的数据框:0, 1, 2, 3, 4, 5, ? 和 -。从0到5的值是字符而不是数值。对于给定的一组行,我想知道它们独占共享的列值。例如,在下面的表格中,我想选择那些对于行3-5具有相同值的列,但仅当它们的值仅限于这三行时。因此,我想选择x6、x7和x8,但不包括x5(因为'2'值在行2中也出现)。我该如何在R中实现这个目标?
x1 x2 x3 x4 x5 x6 x7 x8 x9
1 0 5 1 1 ? 0 5 5
2 1 ? 1 2 5 1 - 5
3 2 1 3 2 1 3 ? 4
4 3 ? 4 2 1 3 ? 4
5 4 0 1 2 1 3 ? 2
请注意,我已将原始文本中的R代码保留为原文,不进行翻译。
英文:
I have a dataframe that contains the following 8 values: 0, 1, 2, 3, 4, 5, ?, and -.
The values from 0 to 5 are characters and not numeric. For a given set of rows, I want to know what column values they share exclusively. For example, in the table below, I want to select the columns that have the same values for rows 3-5, but only when their values are exclusive to those three rows. So I want to select x6, x7, and x8, but not x5 (because the '2' value is found in row 2 as well). How do I do this in R?
x1 x2 x3 x4 x5 x6 x7 x8 x9
1 0 5 1 1 ? 0 5 5
2 1 ? 1 2 5 1 - 5
3 2 1 3 2 1 3 ? 4
4 3 ? 4 2 1 3 ? 4
5 4 0 1 2 1 3 ? 2
答案1
得分: 3
以下是已翻译的部分:
这里是一个你可以尝试的函数 -
select_exclusive_values <- function(x, rows) {
# 选择只感兴趣的值
sub_x <- x[rows]
# 获取它的所有唯一值
unq_sub_x <- unique(sub_x)
# 检查是否所有感兴趣的值都相同,且没有其他地方出现
length(unq_sub_x) == 1 && all(unq_sub_x != x[-rows])
}
这检查一个列
select_exclusive_values(df$x1, 3:5)
#[1] FALSE
你可以使用 sapply
将其应用于数据框中的每一列。
rows <- 3:5
res <- sapply(df, select_exclusive_values, rows)
res
# x1 x2 x3 x4 x5 x6 x7 x8 x9
#FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
要选择那些列 -
df[res]
# x6 x7 x8
#1 ? 0 5
#2 5 1 -
#3 1 3 ?
#4 1 3 ?
#5 1 3 ?
希望这对你有所帮助!
英文:
Here's a function that you can try -
select_exclusive_values <- function(x, rows) {
#Select only interested value
sub_x <- x[rows]
#get all unique values of it
unq_sub_x <- unique(sub_x)
#check if all the interested values are the same and
#none of them occur elsewhere
length(unq_sub_x) == 1 && all(unq_sub_x != x[-rows])
}
This checks for one column
select_exclusive_values(df$x1, 3:5)
#[1] FALSE
You may use sapply
to apply it for every column in the dataframe.
rows <- 3:5
res <- sapply(df, select_exclusive_values, rows)
res
# x1 x2 x3 x4 x5 x6 x7 x8 x9
#FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
To select those columns -
df[res]
# x6 x7 x8
#1 ? 0 5
#2 5 1 -
#3 1 3 ?
#4 1 3 ?
#5 1 3 ?
答案2
得分: 2
使用dplyr
,select
函数会选择那些在所选行中所有值都等于第一个值的列,但同时第一个值又不等于任何其他行的值。
library(dplyr)
r = 3:5
df %>%
select(where(~ all(.x[r] == first(.x[r])) && all(first(.x[r]) != .x[-r])))
x6 x7 x8
1 ? 0 5
2 5 1 -
3 1 3 ?
4 1 3 ?
5 1 3 ?
英文:
With dplyr
, select
columns where
all
values are equal to the first
one, among the selected rows; but also the first value is different from any of the rows' value.
library(dplyr)
r = 3:5
df %>%
select(where(~ all(.x[r] == first(.x[r])) && all(first(.x[r]) != .x[-r])))
x6 x7 x8
1 ? 0 5
2 5 1 -
3 1 3 ?
4 1 3 ?
5 1 3 ?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论