英文:
filtering cases with NAs on two columns by group
问题
Sure, here is the translated code snippet without the code itself:
我想要选择仅包含非 NA 值的组。我认为这个方法会起作用:
columns <- c("z1", "z2")
df %>% group_by(x) %>% filter(all(!is.na(!!columns)))
但似乎没有起到筛选的作用。
Please let me know if you need any further assistance.
英文:
df <- data.frame(x=c("s1","s1","s1","s1","s2","s2","s2","s2","s3","s3","s3","s3"), y=c("g1","g2","g3","g4","g1","g2","g3","g4","g1","g2","g3","g4"), z1=c(1,2,3,2,4,5,6,6,3,2,4,NA), z2=c(NA,1,2,3,1,2,3,1,1,2,1,NA))
I'd like to select only those groups that do not contain NAs. I thought this would work:
columns <- c("z1", "z2")
df %>% group_by(x) %>% filter(all(!is.na(!!columns)))
but it doesn't seem to be filtering
答案1
得分: 2
One option would be to use across
:
library(dplyr, warn = FALSE)
columns <- c("z1", "z2")
df %>%
group_by(x) %>%
filter(all(across(all_of(columns), ~ !is.na(.x))))
#> # A tibble: 4 × 4
#> # Groups: x [1]
#> x y z1 z2
#>
#> 1 s2 g1 4 1
#> 2 s2 g2 5 2
#> 3 s2 g3 6 3
#> 4 s2 g4 6 1
<details>
<summary>英文:</summary>
One option would be to use `across`:
``` r
library(dplyr, warn = FALSE)
columns <- c("z1", "z2")
df %>%
group_by(x) %>%
filter(all(across(all_of(columns), ~ !is.na(.x))))
#> # A tibble: 4 × 4
#> # Groups: x [1]
#> x y z1 z2
#> <chr> <chr> <dbl> <dbl>
#> 1 s2 g1 4 1
#> 2 s2 g2 5 2
#> 3 s2 g3 6 3
#> 4 s2 g4 6 1
答案2
得分: 2
使用 `filter` 中的 `if_all` 方法和 *columns*
library(dplyr)
columns <- c("z1", "z2")
df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))
A tibble: 4 × 4
Groups: x [1]
x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1
或者使用 *tidyselect* 中的 `matches`
library(dplyr)
df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))
A tibble: 4 × 4
Groups: x [1]
x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1
<details>
<summary>英文:</summary>
An approach using `if_all` within `filter` with *columns*
library(dplyr)
columns <- c("z1", "z2")
df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))
A tibble: 4 × 4
Groups: x [1]
x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1
or using *tidyselect* `matches`
library(dplyr)
df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))
A tibble: 4 × 4
Groups: x [1]
x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1
</details>
# 答案3
**得分**: 0
使用sjmisc包
```R
library(sjmisc)
df %>% group_by(x) %>% row_count(., count = NA) %>% #每行中的NA计数
add_count(wt=rowcount) %>% #每个组的总和
filter(n==0) #筛选掉含有NA的组
英文:
use sjmisc package
library(sjmisc)
df %>% group_by(x) %>% row_count(., count = NA) %>% #count NA in each row
add_count(wt=rowcount) %>% #sum for each group
filter(n==0) #filter out groups with NA
答案4
得分: 0
data.table
library(data.table)
df <- data.frame(
stringsAsFactors = FALSE,
x = c("s1","s1","s1","s1","s2", "s2","s2","s2","s3","s3","s3","s3"),
y = c("g1","g2","g3","g4","g1", "g2","g3","g4","g1","g2","g3","g4"),
z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
)
setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x]
#> x z1 z2
#> 1: s2 4 1
#> 2: s2 5 2
#> 3: s2 6 3
#> 4: s2 6 1
Created on 2023-05-11 with reprex v2.0.2
英文:
data.table
library(data.table)
df <- data.frame(
stringsAsFactors = FALSE,
x = c("s1","s1","s1","s1","s2", "s2","s2","s2","s3","s3","s3","s3"),
y = c("g1","g2","g3","g4","g1", "g2","g3","g4","g1","g2","g3","g4"),
z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
)
setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x]
#> x z1 z2
#> 1: s2 4 1
#> 2: s2 5 2
#> 3: s2 6 3
#> 4: s2 6 1
<sup>Created on 2023-05-11 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论