获取跨多列具有不等值的行的索引,排除NA。

huangapple go评论65阅读模式
英文:

Get index of rows with not equal values across several columns, excluding NA

问题

以下是您要翻译的代码部分:

library(dplyr)
d %>%
  filter(if_all(c(c,e), ~ b == .b))

希望这对您有所帮助。如果您有任何其他问题,请随时提出。

英文:

Using as an example this data frame:

a   b   c   d   e

1   x   x   A   x
2   y   y   A   NA
3   z   v   B   NA
4   x   w   T   w
5   s   NA  K   NA

How could I get as TRUE those rows where values across b, c and e columns are not equal, excluding NAs. The idea is to get TRUE (or the index) for the following rows:

a   b   c   d   e

3   z   v   B   NA
4   x   w   T   w

So, my intention is to get those rows where b, c and e are not equal. But in case some of this rows is NA but the other are equal, this should not count as not equal, as NAs should be ignored.

I was trying something like:

library(dplyr)
d %>% 
  filter(if_all(c(c,e), ~ b == .b))

But this way I get TRUE for equal values and, in addition, I get problems with NA.

Do you know how can I solve this?

Thanks!

答案1

得分: 1

以下是使用dplyr的一个想法,

library(dplyr)

df %>%
  rowwise() %>%
  filter(sum(!is.na(c_across(c('b', 'c', 'e')))) > 1, length(unique(na.omit(c_across(c('b', 'c', 'e'))))) > 1) %>%
  ungroup()

一个 tibble: 2 × 5

  a b     c     d     e    


1 3 z v B NA
2 4 x w T w


<details>
<summary>英文:</summary>

Here is an idea using `dplyr`,

    library(dplyr)
    
    df %&gt;%
         rowwise() %&gt;%
         filter(sum(!is.na(c_across(c(&#39;b&#39;, &#39;c&#39;, &#39;e&#39;)))) &gt; 1, length(unique(na.omit(c_across(c(&#39;b&#39;, &#39;c&#39;, &#39;e&#39;))))) &gt; 1) %&gt;%
         ungroup()
    
    # A tibble: 2 &#215; 5
          a b     c     d     e    
      &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
    1     3 z     v     B     NA   
    2     4 x     w     T     w    

</details>



# 答案2
**得分**: 1

apply(
df[, c("b", "c", "e")],
1,
function(row) {
row <- row[!is.na(row)]
any(row != row[1])
}
)

#> [1] FALSE FALSE TRUE TRUE FALSE


---

Where `df` is:

df <- read.table(text =
'a b c d e

1 x x A x
2 y y A NA
3 z v B NA
4 x w T w
5 s NA K NA',
header = TRUE)


<details>
<summary>英文:</summary>

apply(
df[, c("b", "c", "e")],
1,
function(row) {
row <- row[!is.na(row)]
any(row != row[1])
}
)

#> [1] FALSE FALSE TRUE TRUE FALSE


---

Where `df` is:

df <- read.table(text =
'a b c d e

1 x x A x
2 y y A NA
3 z v B NA
4 x w T w
5 s NA K NA',
header = TRUE)


</details>



# 答案3
**得分**: 0

我相信OP想要输出对于所有值都是唯一的行而不包括NAs的情况下为TRUE。我们可以使用`table`逐行进行操作,如果表的所有值都为1(没有重复),则输出TRUE。
请记得`pick`所需的列来供给这个函数。
根据这个新的索引变量进行筛选很简单。

df <- data.frame(
id = c(1:4),
a = c('a', 'a', 'a', 'z'),
b = c('b', 'b', 'c', 'a'),
c = c('c', 'b', NA, 'd'))

library(dplyr)

df |>
mutate(index = apply(pick(a:c), 1, table) |>
lapply((x) all(x ==1)
)
)

a b c index
1 a b c TRUE
2 a b b FALSE
3 a c <NA> TRUE
4 z a d TRUE

一个修改后的、更简单的版本,使用`purrr::pmap`:

df |>
mutate(index = pmap(pick(a:c), (...) all(table(c(...)) == 1)))

id a b c index
1 1 a b c TRUE
2 2 a b b FALSE
3 3 a c <NA> TRUE
4 4 z a d TRUE


<details>
<summary>英文:</summary>

I believe the OP wants to output TRUE for rows in which all values are unique, exluding NAs. We can use `table` rowwise and output TRUE if `all` values of the table are `1`(no duplicates). 
Remember to `pick`the desired columns to feed the function.
Filtering on this new index variable is straightforward.

df <- data.frame(
id = c(1:4),
a = c('a', 'a', 'a', 'z'),
b = c('b', 'b', 'c', 'a'),
c = c('c', 'b', NA, 'd'))

library(dplyr)

df |>
mutate(index = apply(pick(a:c), 1, table) |>
lapply((x) all(x ==1)
)
)

a b c index
1 a b c TRUE
2 a b b FALSE
3 a c <NA> TRUE
4 z a d TRUE

A modified, simpler version, with `purrr::pmap`:

df |>
mutate(index = pmap(pick(a:c), (...) all(table(c(...)) == 1)))

id a b c index
1 1 a b c TRUE
2 2 a b b FALSE
3 3 a c <NA> TRUE
4 4 z a d TRUE

答案4

得分: 0

使用一个辅助函数:

library(tidyverse)

data <- tibble(
  a = c(1, 2, 3, 4, 5),
  b = c("x", "y", "z", "x", "s"),
  c = c("x", "y", "v", "w", NA),
  d = c("A", "A", "B", "T", "K"),
  e = c("x", NA, NA, "w", NA)
)

unique_row <- function(input) {
  result <- input %>%
    na.omit() %>%
    unique()
  
  return(length(result) != 1)
}

data %>%
rowwise() %>%
  filter(unique_row(c(b, c, e)))

(Note: I've removed the HTML encoding for characters like "<" and """ to make the code more readable in Chinese. The code should work as expected without these encodings.)

英文:

With a helper function:

library(tidyverse)

data &lt;- tibble(
  a = c(1, 2, 3, 4, 5),
  b = c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;, &quot;x&quot;, &quot;s&quot;),
  c = c(&quot;x&quot;, &quot;y&quot;, &quot;v&quot;, &quot;w&quot;, NA),
  d = c(&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;T&quot;, &quot;K&quot;),
  e = c(&quot;x&quot;, NA, NA, &quot;w&quot;, NA)
)


unique_row &lt;- function(input) {
  result &lt;- input %&gt;% 
    na.omit() %&gt;%
    unique()
  
  return(length(result) != 1)
}

data %&gt;%
rowwise() %&gt;%
  filter(unique_row(c(b, c, e)))

huangapple
  • 本文由 发表于 2023年6月15日 20:43:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76482606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定