英文:
Get index of rows with not equal values across several columns, excluding NA
问题
以下是您要翻译的代码部分:
library(dplyr)
d %>%
filter(if_all(c(c,e), ~ b == .b))
希望这对您有所帮助。如果您有任何其他问题,请随时提出。
英文:
Using as an example this data frame:
a b c d e
1 x x A x
2 y y A NA
3 z v B NA
4 x w T w
5 s NA K NA
How could I get as TRUE those rows where values across b
, c
and e
columns are not equal, excluding NAs. The idea is to get TRUE (or the index) for the following rows:
a b c d e
3 z v B NA
4 x w T w
So, my intention is to get those rows where b
, c
and e
are not equal. But in case some of this rows is NA but the other are equal, this should not count as not equal, as NAs should be ignored.
I was trying something like:
library(dplyr)
d %>%
filter(if_all(c(c,e), ~ b == .b))
But this way I get TRUE for equal values and, in addition, I get problems with NA.
Do you know how can I solve this?
Thanks!
答案1
得分: 1
以下是使用dplyr
的一个想法,
library(dplyr)
df %>%
rowwise() %>%
filter(sum(!is.na(c_across(c('b', 'c', 'e')))) > 1, length(unique(na.omit(c_across(c('b', 'c', 'e'))))) > 1) %>%
ungroup()
一个 tibble: 2 × 5
a b c d e
1 3 z v B NA
2 4 x w T w
<details>
<summary>英文:</summary>
Here is an idea using `dplyr`,
library(dplyr)
df %>%
rowwise() %>%
filter(sum(!is.na(c_across(c('b', 'c', 'e')))) > 1, length(unique(na.omit(c_across(c('b', 'c', 'e'))))) > 1) %>%
ungroup()
# A tibble: 2 × 5
a b c d e
<int> <chr> <chr> <chr> <chr>
1 3 z v B NA
2 4 x w T w
</details>
# 答案2
**得分**: 1
apply(
df[, c("b", "c", "e")],
1,
function(row) {
row <- row[!is.na(row)]
any(row != row[1])
}
)
#> [1] FALSE FALSE TRUE TRUE FALSE
---
Where `df` is:
df <- read.table(text =
'a b c d e
1 x x A x
2 y y A NA
3 z v B NA
4 x w T w
5 s NA K NA',
header = TRUE)
<details>
<summary>英文:</summary>
apply(
df[, c("b", "c", "e")],
1,
function(row) {
row <- row[!is.na(row)]
any(row != row[1])
}
)
#> [1] FALSE FALSE TRUE TRUE FALSE
---
Where `df` is:
df <- read.table(text =
'a b c d e
1 x x A x
2 y y A NA
3 z v B NA
4 x w T w
5 s NA K NA',
header = TRUE)
</details>
# 答案3
**得分**: 0
我相信OP想要输出对于所有值都是唯一的行而不包括NAs的情况下为TRUE。我们可以使用`table`逐行进行操作,如果表的所有值都为1(没有重复),则输出TRUE。
请记得`pick`所需的列来供给这个函数。
根据这个新的索引变量进行筛选很简单。
df <- data.frame(
id = c(1:4),
a = c('a', 'a', 'a', 'z'),
b = c('b', 'b', 'c', 'a'),
c = c('c', 'b', NA, 'd'))
library(dplyr)
df |>
mutate(index = apply(pick(a:c), 1, table) |>
lapply((x) all(x ==1)
)
)
a b c index
1 a b c TRUE
2 a b b FALSE
3 a c <NA> TRUE
4 z a d TRUE
一个修改后的、更简单的版本,使用`purrr::pmap`:
df |>
mutate(index = pmap(pick(a:c), (...) all(table(c(...)) == 1)))
id a b c index
1 1 a b c TRUE
2 2 a b b FALSE
3 3 a c <NA> TRUE
4 4 z a d TRUE
<details>
<summary>英文:</summary>
I believe the OP wants to output TRUE for rows in which all values are unique, exluding NAs. We can use `table` rowwise and output TRUE if `all` values of the table are `1`(no duplicates).
Remember to `pick`the desired columns to feed the function.
Filtering on this new index variable is straightforward.
df <- data.frame(
id = c(1:4),
a = c('a', 'a', 'a', 'z'),
b = c('b', 'b', 'c', 'a'),
c = c('c', 'b', NA, 'd'))
library(dplyr)
df |>
mutate(index = apply(pick(a:c), 1, table) |>
lapply((x) all(x ==1)
)
)
a b c index
1 a b c TRUE
2 a b b FALSE
3 a c <NA> TRUE
4 z a d TRUE
A modified, simpler version, with `purrr::pmap`:
df |>
mutate(index = pmap(pick(a:c), (...) all(table(c(...)) == 1)))
id a b c index
1 1 a b c TRUE
2 2 a b b FALSE
3 3 a c <NA> TRUE
4 4 z a d TRUE
答案4
得分: 0
使用一个辅助函数:
library(tidyverse)
data <- tibble(
a = c(1, 2, 3, 4, 5),
b = c("x", "y", "z", "x", "s"),
c = c("x", "y", "v", "w", NA),
d = c("A", "A", "B", "T", "K"),
e = c("x", NA, NA, "w", NA)
)
unique_row <- function(input) {
result <- input %>%
na.omit() %>%
unique()
return(length(result) != 1)
}
data %>%
rowwise() %>%
filter(unique_row(c(b, c, e)))
(Note: I've removed the HTML encoding for characters like "<" and """ to make the code more readable in Chinese. The code should work as expected without these encodings.)
英文:
With a helper function:
library(tidyverse)
data <- tibble(
a = c(1, 2, 3, 4, 5),
b = c("x", "y", "z", "x", "s"),
c = c("x", "y", "v", "w", NA),
d = c("A", "A", "B", "T", "K"),
e = c("x", NA, NA, "w", NA)
)
unique_row <- function(input) {
result <- input %>%
na.omit() %>%
unique()
return(length(result) != 1)
}
data %>%
rowwise() %>%
filter(unique_row(c(b, c, e)))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论