2023年7月4日 23:08:24go评论64阅读模式

英文:

How do I delete all columns that contain NA only with dbplyr?

问题

我想删除一个数据框中所有只包含NA值的列，在`dbplyr`语法中。
使用普通的R语法没有问题，参见这里：https://stackoverflow.com/questions/2643939/remove-columns-from-dataframe-where-all-values-are-na
这是在R中的一个可能性：

library(tidyverse)
library(DBI)
library(dbplyr)

df <- data.frame(a = NA,
                 b = seq(1:5), 
                 c = c(rep(1, 4), NA))

df %>% purrr::discard(~all(is.na(.)))
# 期望输出：
# A tibble: 5 × 2
      b     c
  <int> <dbl>
1     1     1
2     2     1
3     3     1
4     4     1
5     5    NA

但现在我想在`dbplyr`语法中做到这一点。以下是一个示例：

# 创建测试数据库
con <- dbConnect(RSQLite::SQLite(), "")
# 插入测试表
dbWriteTable(con, "df", df)

con %>% 
  tbl("df") %>% 
  purrr::discard(~all(is.na(.))) %>% 
  collect()

# 错误信息：Error in UseMethod("collect") : 
# no applicable method for 'collect' applied to an object of class "list"

英文:

I want to delete all columns of a dataframe which contain NA values only, within the dbplyr syntax.
With the normal R syntax this is no problem, see here: https://stackoverflow.com/questions/2643939/remove-columns-from-dataframe-where-all-values-are-na
Here is a possibility in R:

library(tidyverse)
library(DBI)
library(dbplyr)

df &lt;- data.frame(a = NA,
                 b = seq(1:5), 
                 c = c(rep(1, 4), NA))

df %&gt;% purrr::discard(~all(is.na(.)))
# desired output:
# A tibble: 5 &#215; 2
      b     c
  &lt;int&gt; &lt;dbl&gt;
1     1     1
2     2     1
3     3     1
4     4     1
5     5    NA

But now I want to do this in the dbplyr syntax. For this the following example:

# Creating test database
con &lt;- dbConnect(RSQLite::SQLite(), &quot;&quot;)
# Inserting test table
dbWriteTable(con, &quot;df&quot;, df)

con %&gt;% 
  tbl(&quot;df&quot;) %&gt;% 
  purrr::discard(~all(is.na(.))) %&gt;% 
  collect()

#Error in UseMethod(&quot;collect&quot;) : 
# no applicable method for &#39;collect&#39; applied to an object of class &quot;list&quot;

Can anyone help me find a solution within dbplyr?

答案1

得分: 3

修订后的答案 - 删除所有值都为NA的列

这里的想法是检查每一列的NA值的数量并与行数进行比较：

df = con %>% tbl("df")

num_rows = df %>%
  ungroup() %>%
  summarise(num = n()) %>%
  collect() %>%
  pull(num)

cols_to_remove = c()

for(col in colnames(df)){
  num_na = df %>%
    ungroup() %>%
    mutate(is_na = ifelse(is.na(!!sym(col)), 1, 0)) %>%
    summarise(num_na = sum(is_na)) %>%
    collect() %>%
    pull(num_na)

  if(num_na == num_rows){
    cols_to_remove = c(cols_to_remove, col)
  }
}

local_df = df %>%
  select(-all_of(cols_to_remove)) %>%
  collect()

原始答案 - 删除所有值都为NA的行

这是一种方法：

df = con %>% tbl("df")

df = df %>% mutate(all_na = TRUE)

for(col in colnames(df)){
  df = df %>%
    mutate(all_na = ifelse(!is.na(!!sym(col)), FALSE, all_na))
}

df %>%
  filter(all_na != TRUE) %>%
  select(-all_na)

这个想法是创建一个新的列来存储行中是否存在NA值。通过迭代所有行并更新这列，我们创建了一个指示器列，表示是否存在所有NA值。

语法解释

!!sym(.) 是 rlang 的特性。它将包含字符串 "my_column" 的变量 col 转换为列名 my_column。

在 select 中使用 !!sym(.) 的地方，我们使用了 all_of 来实现类似的功能。

英文:

Revised answer - remove columns where every value is NA

The idea here is check each column for the number of NAs and compare against the number of rows:

df = con %&gt;% tbl(&quot;df&quot;)

num_rows = df %&gt;%
  ungroup() %&gt;%
  summarise(num = n()) %&gt;%
  collect() %&gt;%
  pull(num)

cols_to_remove = c()

for(col in colnames(df)){
  num_na = df %&gt;%
    ungroup() %&gt;%
    mutate(is_na = ifelse(is.na(!!sym(col)), 1, 0)) %&gt;%
    summarise(num_na = sum(is_na)) %&gt;%
    collect() %&gt;%
    pull(num_na)

  if(num_na == num_rows){
    cols_to_remove = c(cols_to_remove, col)
  }
}

local_df = df %&gt;%
  select(-all_of(cols_to_remove)) %&gt;%
  collect()

Original answer - removes rows where every value is NA

Here is one approach:

df = con %&gt;% tbl(&quot;df&quot;)

df = df %&gt;% mutate(all_na = TRUE)

for(col in colnames(df)){
  df = df %&gt;%
    mutate(all_na = ifelse(!is.na(!!sym(col)), FALSE, all_na))
}

df %&gt;%
  filter(all_na != TRUE) %&gt;%
  select(-all_na)

The idea is to create a new column to store whether NAs are found in the row. By iterating through all the rows and updating this column, we create an indicator column for the presence of all NA values.

Syntax explanation

!!sym(.) is an rlang feature. It converts from the variable col containing the string "my_column" to the column name my_column.

Instead of using !!sym(.) inside select we use all_of for a similar purpose.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用dbplyr如何删除所有只包含NA的列？

问题

答案1

修订后的答案 - 删除所有值都为NA的列

原始答案 - 删除所有值都为NA的行

语法解释

Revised answer - remove columns where every value is NA

Original answer - removes rows where every value is NA

Syntax explanation

How do I ensure reactable::getReactableState() returns the correct row selection in a Shiny app when table is regenerated?

更改集群标签以进行比较目的

移动绘图上的标签远离线。

为什么数据框输出没有按学校代码分组？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论