2023年5月11日 06:03:59go评论81阅读模式

英文:

filtering cases with NAs on two columns by group

问题

Sure, here is the translated code snippet without the code itself:

我想要选择仅包含非 NA 值的组。我认为这个方法会起作用：
    columns <- c("z1", "z2")
    df %>% group_by(x) %>% filter(all(!is.na(!!columns)))
但似乎没有起到筛选的作用。

Please let me know if you need any further assistance.

英文:

df &lt;- data.frame(x=c(&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;), y=c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;), z1=c(1,2,3,2,4,5,6,6,3,2,4,NA), z2=c(NA,1,2,3,1,2,3,1,1,2,1,NA))

I'd like to select only those groups that do not contain NAs. I thought this would work:

columns &lt;- c(&quot;z1&quot;, &quot;z2&quot;)
df %&gt;% group_by(x) %&gt;% filter(all(!is.na(!!columns)))

but it doesn't seem to be filtering

答案1

得分: 2

One option would be to use across:

library(dplyr, warn = FALSE)
columns <- c("z1", "z2")
df %>%
  group_by(x) %>%
  filter(all(across(all_of(columns), ~ !is.na(.x))))

#> # A tibble: 4 × 4
#> # Groups: x [1]
#> x y z1 z2
#>
#> 1 s2 g1 4 1
#> 2 s2 g2 5 2
#> 3 s2 g3 6 3
#> 4 s2 g4 6 1


<details>
<summary>英文:</summary>
One option would be to use `across`:
``` r
library(dplyr, warn = FALSE)
columns &lt;- c(&quot;z1&quot;, &quot;z2&quot;)
df %&gt;%
  group_by(x) %&gt;%
  filter(all(across(all_of(columns), ~ !is.na(.x))))
#&gt; # A tibble: 4 &#215; 4
#&gt; # Groups:   x [1]
#&gt;   x     y        z1    z2
#&gt;   &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 s2    g1        4     1
#&gt; 2 s2    g2        5     2
#&gt; 3 s2    g3        6     3
#&gt; 4 s2    g4        6     1

答案2

得分: 2

使用 `filter` 中的 `if_all` 方法和 *columns*

library(dplyr)

columns <- c("z1", "z2")

df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


或者使用 *tidyselect* 中的 `matches`

library(dplyr)

df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


<details>
<summary>英文:</summary>
An approach using `if_all` within `filter` with *columns*

library(dplyr)

columns <- c("z1", "z2")

df %>%
group_by(x) %>%
filter(if_all(!!columns, ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


or using *tidyselect* `matches`

library(dplyr)

df %>%
group_by(x) %>%
filter(if_all(matches("z[12]"), ~ all(!is.na(.x))))

A tibble: 4 × 4

Groups: x [1]

x y z1 z2
<chr> <chr> <dbl> <dbl>
1 s2 g1 4 1
2 s2 g2 5 2
3 s2 g3 6 3
4 s2 g4 6 1


</details>
# 答案3
**得分**: 0
使用sjmisc包
```R
library(sjmisc)
df %>% group_by(x) %>% row_count(., count = NA) %>% #每行中的NA计数
add_count(wt=rowcount) %>% #每个组的总和
filter(n==0)  #筛选掉含有NA的组

英文:

use sjmisc package

library(sjmisc)
df %&gt;% group_by(x) %&gt;% row_count(., count = NA) %&gt;% #count NA in each row
add_count(wt=rowcount) %&gt;% #sum for each group
filter(n==0)  #filter out groups with NA

答案4

得分: 0

data.table

library(data.table)
df <- data.frame(
  stringsAsFactors = FALSE,
  x = c("s1","s1","s1","s1","s2", "s2","s2","s2","s3","s3","s3","s3"),
  y = c("g1","g2","g3","g4","g1", "g2","g3","g4","g1","g2","g3","g4"),
  z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
  z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
)
setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x] 
#>     x z1 z2
#> 1: s2  4  1
#> 2: s2  5  2
#> 3: s2  6  3
#> 4: s2  6  1

^{Created on 2023-05-11 with reprex v2.0.2}

英文:

data.table

library(data.table)
df &lt;- data.frame(
  stringsAsFactors = FALSE,
  x = c(&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s1&quot;,&quot;s2&quot;, &quot;s2&quot;,&quot;s2&quot;,&quot;s2&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;,&quot;s3&quot;),
  y = c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;, &quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;),
  z1 = c(1, 2, 3, 2, 4, 5, 6, 6, 3, 2, 4, NA),
  z2 = c(NA, 1, 2, 3, 1, 2, 3, 1, 1, 2, 1, NA)
)
setDT(df)[, .SD[all(!is.na(rowSums(.SD)))], .SDcols = is.numeric, by = x] 
#&gt;     x z1 z2
#&gt; 1: s2  4  1
#&gt; 2: s2  5  2
#&gt; 3: s2  6  3
#&gt; 4: s2  6  1

<sup>Created on 2023-05-11 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按组筛选具有两列NA值的情况。

问题

答案1

答案2

A tibble: 4 × 4

Groups: x [1]

A tibble: 4 × 4

Groups: x [1]

A tibble: 4 × 4

Groups: x [1]

A tibble: 4 × 4

Groups: x [1]

答案4

在R中，使用`sub`函数保留在“.”之前的所有内容的方法是：

如何使用grep或grepl来识别变量？

“Attachments functions return ‘Error: attempt to apply non-function'”

R DBI::dbGetQuery的where子句将字符串解释为列名。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论