按组筛选具有两列NA值的情况。

huangapple go评论60阅读模式
英文:

filtering cases with NAs on two columns by group

问题

Sure, here is the translated code snippet without the code itself:

我想要选择仅包含非 NA 值的组。我认为这个方法会起作用:

    columns <- c("z1", "z2")
    df %>% group_by(x) %>% filter(all(!is.na(!!columns)))

但似乎没有起到筛选的作用。

Please let me know if you need any further assistance.

英文:
df &lt;- data.frame(x=c(&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;), y=c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;), z1=c(1,2,3,2,4,5,6,6,3,2,4,NA), z2=c(NA,1,2,3,1,2,3,1,1,2,1,NA))

I'd like to select only those groups that do not contain NAs. I thought this would work:

columns &lt;- c(&quot;z1&quot;, &quot;z2&quot;)
df %&gt;% group_by(x) %&gt;% filter(all(!is.na(!!columns)))

but it doesn't seem to be filtering

答案1

得分: 2

One option would be to use across:

library(dplyr, warn = FALSE)

columns <- c("z1", "z2")

df %>%
  group_by(x) %>%
  filter(all(across(all_of(columns), ~ !is.na(.x))))

#> # A tibble: 4 × 4
#> # Groups: x [1]
#> x y z1 z2
#>
#> 1 s2 g1 4 1
#> 2 s2 g2 5 2
#> 3 s2 g3 6 3
#> 4 s2 g4 6 1


<details>
<summary>英文:</summary>

One option would be to use `across`:

``` r
library(dplyr, warn = FALSE)

columns &lt;- c(&quot;z1&quot;, &quot;z2&quot;)

df %&gt;%
  group_by(x) %&gt;%
  filter(all(across(all_of(columns), ~ !is.na(.x))))

#&gt; # A tibble: 4 &#215; 4
#&gt; # Groups:   x [1]
#&gt;   x     y        z1    z2
#&gt;   &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 s2    g1        4     1
#&gt; 2 s2    g2        5     2
#&gt; 3 s2    g3        6     3
#&gt; 4 s2    g4        6     1

答案2

得分: 2

使用 `filter` 中的 `if_all` 方法和 *columns*

library(dplyr)

columns <- c("z1", "z2")

df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


或者使用 *tidyselect* 中的 `matches`

library(dplyr)

df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


<details>
<summary>英文:</summary>

An approach using `if_all` within `filter` with *columns*

library(dplyr)

columns <- c("z1", "z2")

df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


or using *tidyselect* `matches`

library(dplyr)

df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


</details>



# 答案3
**得分**: 0

使用sjmisc包
```R
library(sjmisc)
df %>% group_by(x) %>% row_count(., count = NA) %>% #每行中的NA计数
add_count(wt=rowcount) %>% #每个组的总和
filter(n==0)  #筛选掉含有NA的组
英文:

use sjmisc package

library(sjmisc)
df %&gt;% group_by(x) %&gt;% row_count(., count = NA) %&gt;% #count NA in each row
add_count(wt=rowcount) %&gt;% #sum for each group
filter(n==0)  #filter out groups with NA

答案4

得分: 0

data.table

library(data.table)

df <- data.frame(
  stringsAsFactors = FALSE,
  x = c("s1","s1","s1","s1","s2", "s2","s2","s2","s3","s3","s3","s3"),
  y = c("g1","g2","g3","g4","g1", "g2","g3","g4","g1","g2","g3","g4"),
  z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
  z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
)


setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x] 
#>     x z1 z2
#> 1: s2  4  1
#> 2: s2  5  2
#> 3: s2  6  3
#> 4: s2  6  1

Created on 2023-05-11 with reprex v2.0.2

英文:

data.table

library(data.table)

df &lt;- data.frame(
  stringsAsFactors = FALSE,
  x = c(&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s2&quot;, &quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;),
  y = c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;, &quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;),
  z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
  z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
)


setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x] 
#&gt;     x z1 z2
#&gt; 1: s2  4  1
#&gt; 2: s2  5  2
#&gt; 3: s2  6  3
#&gt; 4: s2  6  1

<sup>Created on 2023-05-11 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月11日 06:03:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定