按组筛选具有两列NA值的情况。

huangapple go评论81阅读模式
英文:

filtering cases with NAs on two columns by group

问题

Sure, here is the translated code snippet without the code itself:

  1. 我想要选择仅包含非 NA 值的组。我认为这个方法会起作用:
  2. columns <- c("z1", "z2")
  3. df %>% group_by(x) %>% filter(all(!is.na(!!columns)))
  4. 但似乎没有起到筛选的作用。

Please let me know if you need any further assistance.

英文:
  1. df &lt;- data.frame(x=c(&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;), y=c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;), z1=c(1,2,3,2,4,5,6,6,3,2,4,NA), z2=c(NA,1,2,3,1,2,3,1,1,2,1,NA))

I'd like to select only those groups that do not contain NAs. I thought this would work:

  1. columns &lt;- c(&quot;z1&quot;, &quot;z2&quot;)
  2. df %&gt;% group_by(x) %&gt;% filter(all(!is.na(!!columns)))

but it doesn't seem to be filtering

答案1

得分: 2

One option would be to use across:

  1. library(dplyr, warn = FALSE)
  2. columns <- c("z1", "z2")
  3. df %>%
  4. group_by(x) %>%
  5. filter(all(across(all_of(columns), ~ !is.na(.x))))

#> # A tibble: 4 × 4
#> # Groups: x [1]
#> x y z1 z2
#>
#> 1 s2 g1 4 1
#> 2 s2 g2 5 2
#> 3 s2 g3 6 3
#> 4 s2 g4 6 1

  1. <details>
  2. <summary>英文:</summary>
  3. One option would be to use `across`:
  4. ``` r
  5. library(dplyr, warn = FALSE)
  6. columns &lt;- c(&quot;z1&quot;, &quot;z2&quot;)
  7. df %&gt;%
  8. group_by(x) %&gt;%
  9. filter(all(across(all_of(columns), ~ !is.na(.x))))
  10. #&gt; # A tibble: 4 &#215; 4
  11. #&gt; # Groups: x [1]
  12. #&gt; x y z1 z2
  13. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  14. #&gt; 1 s2 g1 4 1
  15. #&gt; 2 s2 g2 5 2
  16. #&gt; 3 s2 g3 6 3
  17. #&gt; 4 s2 g4 6 1

答案2

得分: 2

  1. 使用 `filter` 中的 `if_all` 方法和 *columns*

library(dplyr)

columns <- c("z1", "z2")

df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1

  1. 或者使用 *tidyselect* 中的 `matches`

library(dplyr)

df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1

  1. <details>
  2. <summary>英文:</summary>
  3. An approach using `if_all` within `filter` with *columns*

library(dplyr)

columns <- c("z1", "z2")

df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1

  1. or using *tidyselect* `matches`

library(dplyr)

df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1

  1. </details>
  2. # 答案3
  3. **得分**: 0
  4. 使用sjmisc包
  5. ```R
  6. library(sjmisc)
  7. df %>% group_by(x) %>% row_count(., count = NA) %>% #每行中的NA计数
  8. add_count(wt=rowcount) %>% #每个组的总和
  9. filter(n==0) #筛选掉含有NA的组
英文:

use sjmisc package

  1. library(sjmisc)
  2. df %&gt;% group_by(x) %&gt;% row_count(., count = NA) %&gt;% #count NA in each row
  3. add_count(wt=rowcount) %&gt;% #sum for each group
  4. filter(n==0) #filter out groups with NA

答案4

得分: 0

data.table

  1. library(data.table)
  2. df <- data.frame(
  3. stringsAsFactors = FALSE,
  4. x = c("s1","s1","s1","s1","s2", "s2","s2","s2","s3","s3","s3","s3"),
  5. y = c("g1","g2","g3","g4","g1", "g2","g3","g4","g1","g2","g3","g4"),
  6. z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
  7. z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
  8. )
  9. setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x]
  10. #> x z1 z2
  11. #> 1: s2 4 1
  12. #> 2: s2 5 2
  13. #> 3: s2 6 3
  14. #> 4: s2 6 1

Created on 2023-05-11 with reprex v2.0.2

英文:

data.table

  1. library(data.table)
  2. df &lt;- data.frame(
  3. stringsAsFactors = FALSE,
  4. x = c(&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s2&quot;, &quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;),
  5. y = c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;, &quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;),
  6. z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
  7. z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
  8. )
  9. setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x]
  10. #&gt; x z1 z2
  11. #&gt; 1: s2 4 1
  12. #&gt; 2: s2 5 2
  13. #&gt; 3: s2 6 3
  14. #&gt; 4: s2 6 1

<sup>Created on 2023-05-11 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月11日 06:03:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定