获取跨多列具有不等值的行的索引,排除NA。

huangapple go评论88阅读模式
英文:

Get index of rows with not equal values across several columns, excluding NA

问题

以下是您要翻译的代码部分:

  1. library(dplyr)
  2. d %>%
  3. filter(if_all(c(c,e), ~ b == .b))

希望这对您有所帮助。如果您有任何其他问题,请随时提出。

英文:

Using as an example this data frame:

  1. a b c d e
  2. 1 x x A x
  3. 2 y y A NA
  4. 3 z v B NA
  5. 4 x w T w
  6. 5 s NA K NA

How could I get as TRUE those rows where values across b, c and e columns are not equal, excluding NAs. The idea is to get TRUE (or the index) for the following rows:

  1. a b c d e
  2. 3 z v B NA
  3. 4 x w T w

So, my intention is to get those rows where b, c and e are not equal. But in case some of this rows is NA but the other are equal, this should not count as not equal, as NAs should be ignored.

I was trying something like:

  1. library(dplyr)
  2. d %>%
  3. filter(if_all(c(c,e), ~ b == .b))

But this way I get TRUE for equal values and, in addition, I get problems with NA.

Do you know how can I solve this?

Thanks!

答案1

得分: 1

以下是使用dplyr的一个想法,

  1. library(dplyr)
  2. df %>%
  3. rowwise() %>%
  4. filter(sum(!is.na(c_across(c('b', 'c', 'e')))) > 1, length(unique(na.omit(c_across(c('b', 'c', 'e'))))) > 1) %>%
  5. ungroup()

一个 tibble: 2 × 5

  1. a b c d e


1 3 z v B NA
2 4 x w T w

  1. <details>
  2. <summary>英文:</summary>
  3. Here is an idea using `dplyr`,
  4. library(dplyr)
  5. df %&gt;%
  6. rowwise() %&gt;%
  7. filter(sum(!is.na(c_across(c(&#39;b&#39;, &#39;c&#39;, &#39;e&#39;)))) &gt; 1, length(unique(na.omit(c_across(c(&#39;b&#39;, &#39;c&#39;, &#39;e&#39;))))) &gt; 1) %&gt;%
  8. ungroup()
  9. # A tibble: 2 &#215; 5
  10. a b c d e
  11. &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  12. 1 3 z v B NA
  13. 2 4 x w T w
  14. </details>
  15. # 答案2
  16. **得分**: 1

apply(
df[, c("b", "c", "e")],
1,
function(row) {
row <- row[!is.na(row)]
any(row != row[1])
}
)

#> [1] FALSE FALSE TRUE TRUE FALSE

  1. ---
  2. Where `df` is:

df <- read.table(text =
'a b c d e

1 x x A x
2 y y A NA
3 z v B NA
4 x w T w
5 s NA K NA',
header = TRUE)

  1. <details>
  2. <summary>英文:</summary>

apply(
df[, c("b", "c", "e")],
1,
function(row) {
row <- row[!is.na(row)]
any(row != row[1])
}
)

#> [1] FALSE FALSE TRUE TRUE FALSE

  1. ---
  2. Where `df` is:

df <- read.table(text =
'a b c d e

1 x x A x
2 y y A NA
3 z v B NA
4 x w T w
5 s NA K NA',
header = TRUE)

  1. </details>
  2. # 答案3
  3. **得分**: 0
  4. 我相信OP想要输出对于所有值都是唯一的行而不包括NAs的情况下为TRUE。我们可以使用`table`逐行进行操作,如果表的所有值都为1(没有重复),则输出TRUE。
  5. 请记得`pick`所需的列来供给这个函数。
  6. 根据这个新的索引变量进行筛选很简单。

df <- data.frame(
id = c(1:4),
a = c('a', 'a', 'a', 'z'),
b = c('b', 'b', 'c', 'a'),
c = c('c', 'b', NA, 'd'))

library(dplyr)

df |>
mutate(index = apply(pick(a:c), 1, table) |>
lapply((x) all(x ==1)
)
)

a b c index
1 a b c TRUE
2 a b b FALSE
3 a c <NA> TRUE
4 z a d TRUE

  1. 一个修改后的、更简单的版本,使用`purrr::pmap`:

df |>
mutate(index = pmap(pick(a:c), (...) all(table(c(...)) == 1)))

id a b c index
1 1 a b c TRUE
2 2 a b b FALSE
3 3 a c <NA> TRUE
4 4 z a d TRUE

  1. <details>
  2. <summary>英文:</summary>
  3. I believe the OP wants to output TRUE for rows in which all values are unique, exluding NAs. We can use `table` rowwise and output TRUE if `all` values of the table are `1`(no duplicates).
  4. Remember to `pick`the desired columns to feed the function.
  5. Filtering on this new index variable is straightforward.

df <- data.frame(
id = c(1:4),
a = c('a', 'a', 'a', 'z'),
b = c('b', 'b', 'c', 'a'),
c = c('c', 'b', NA, 'd'))

library(dplyr)

df |>
mutate(index = apply(pick(a:c), 1, table) |>
lapply((x) all(x ==1)
)
)

a b c index
1 a b c TRUE
2 a b b FALSE
3 a c <NA> TRUE
4 z a d TRUE

  1. A modified, simpler version, with `purrr::pmap`:

df |>
mutate(index = pmap(pick(a:c), (...) all(table(c(...)) == 1)))

id a b c index
1 1 a b c TRUE
2 2 a b b FALSE
3 3 a c <NA> TRUE
4 4 z a d TRUE

答案4

得分: 0

使用一个辅助函数:

  1. library(tidyverse)
  2. data <- tibble(
  3. a = c(1, 2, 3, 4, 5),
  4. b = c("x", "y", "z", "x", "s"),
  5. c = c("x", "y", "v", "w", NA),
  6. d = c("A", "A", "B", "T", "K"),
  7. e = c("x", NA, NA, "w", NA)
  8. )
  9. unique_row <- function(input) {
  10. result <- input %>%
  11. na.omit() %>%
  12. unique()
  13. return(length(result) != 1)
  14. }
  15. data %>%
  16. rowwise() %>%
  17. filter(unique_row(c(b, c, e)))

(Note: I've removed the HTML encoding for characters like "<" and """ to make the code more readable in Chinese. The code should work as expected without these encodings.)

英文:

With a helper function:

  1. library(tidyverse)
  2. data &lt;- tibble(
  3. a = c(1, 2, 3, 4, 5),
  4. b = c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;, &quot;x&quot;, &quot;s&quot;),
  5. c = c(&quot;x&quot;, &quot;y&quot;, &quot;v&quot;, &quot;w&quot;, NA),
  6. d = c(&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;T&quot;, &quot;K&quot;),
  7. e = c(&quot;x&quot;, NA, NA, &quot;w&quot;, NA)
  8. )
  9. unique_row &lt;- function(input) {
  10. result &lt;- input %&gt;%
  11. na.omit() %&gt;%
  12. unique()
  13. return(length(result) != 1)
  14. }
  15. data %&gt;%
  16. rowwise() %&gt;%
  17. filter(unique_row(c(b, c, e)))

huangapple
  • 本文由 发表于 2023年6月15日 20:43:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76482606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定