R(dplyr)- 使用paste总结数据框

huangapple go评论74阅读模式
英文:

R (dplyr) - Summarizing a data frame using paste

问题

代码部分不需要翻译,以下是翻译好的部分:

我试图使用一个group_by条件来总结我的数据框(正如我之前在Stack Exchange上看到的),但由于某种原因,代码一直出现错误。(请注意,我使用Impala来提取数据,但我不明白这可能是问题的原因)。这样做的目的只是为了压缩我的表格(使用所示的3个条件进行分组),第四个条件被合并成一个字符串(内部连接已单独测试并正常工作)

错误信息:

Error in 'check_collapse()':
! 'collapse' not supported in DB translation of paste().
i.please use str_flatten() instead.

我已经删除了summarise部分,代码运行正常,但一旦放入该部分,就会遇到错误。我还尝试过使用mutate(而不是summarise)来尝试,代码运行了,但实际上没有将所有个别字符串粘贴在一起(不知道为什么)。

英文:

I'm trying to summarize my data frame using a group_by condition (as I've seen before on the stackexchange), but for some reason the code keeps running into errors. (Note I use Impala to pull data but I wouldn't get why this would be the problem). The goal of this is simply to condense my table (grouping with the 3 conditions shown), and the 4th is merged together into one string (the inner join was tested separately and worked fine)

library(DBI)
library(dplyr)
library(dbplyr)
library(stringr)

merged_data <- inner_join(attribute_data_filtered,name_data_filtered, by = c('key' = 'assigned_key')) %>%
    arrange(key,attribute,name,login) %>%
    distinct(key,attribute,name,login, .keep_all = TRUE) %>% 
    group_by(key,name,login) %>%
    summarise(new_col= paste(attribute, collapse = "_")) %>%
    ungroup() %>%
    select(key,new_col,name,login) %>%
    collect()

The code keeps spitting out nonsense errors saying the parameter "collapse" cannot be used and should instead be replaced by str_flatten. And when I try using str_flatten it says that is also invalid. Any indications on what would be the problem?

Error Message:

Error in 'check_collapse()':
! 'collapse' not supported in DB translation of paste()'.
i.please use str_flatten() instead.

I've removed the summarise part and the code runs fine, but as soon as I put it in I encounter an error. I also tried using mutate (instead of summarise) for fun and it ran but didn't actually paste all the individual strings (not sure why)

答案1

得分: 1

dbplyr 将dplyr语法翻译成您数据库的语法。某些数据库不支持某些dplyr/tidyr等函数/选项。因此,一般来说,如果您有

database_table |>
  do_stuff_that_translates_fine() |>
  do_stuff_that_doesnt_translate() |>
  collect()

您可以将其替换为

database_table |>
  do_stuff_that_translates_fine() |>
  collect() |> 
  do_stuff_that_doesnt_translate()

所以在这种情况下,我期望将collect()行移动到group_by之前可以避免需要翻译paste(... collapse = "_")str_flatten()步骤,因为它们不起作用。

英文:

dbplyr translates dplyr syntax into your database's syntax. Some dplyr/tidyr/etc. functions/options are not available for some databases. So in general, if you have

database_table |>
  do_stuff_that_translates_fine() |>
  do_stuff_that_doesnt_translate() |>
  collect()

you can replace that with

database_table |>
  do_stuff_that_translates_fine() |>
  collect() |> 
  do_stuff_that_doesnt_translate() 

so in this case I expect moving the collect() line above the group_by would avoid needing to translate the paste(... collapse = "_") or str_flatten() steps that aren't working.

huangapple
  • 本文由 发表于 2023年6月22日 12:15:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76528573.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定