英文:
Add/subtract date from date time format in dbplyr
问题
我正在使用dbplyr来访问我的数据库中的数据,我正在尝试计算包括前一年所有金额的滚动总和。我的日期格式为dttm。我想要获取一个表,显示开始日期之前的1年,类似下面的表格:
开始日期 | 前1年 |
---|---|
2021-07-02 08:03:29 | 2020-07-02 08:03:29 |
2021-07-09 06:36:28 | 2020-07-09 06:36:28 |
2021-07-09 09:29:02 | 2020-07-09 09:29:02 |
2021-07-16 08:26:30 | 2020-07-16 08:26:30 |
2021-07-23 07:35:51 | 2020-07-23 07:35:51 |
2021-07-30 09:19:13 | 2020-07-30 09:19:13 |
我想要从开始日期减去正好1年,以便为每一行/观测获得一个滚动的一年窗口,用于滚动总和计算。
我以前在dplyr中做过这个,通常会使用以下代码来实现这个目标:
output = df %>%
mutate(Previous_1year = Start_Date %m+% years(-1))
然而,这种方法似乎与dbplyr和我的数据库不兼容,我猜测这是因为dbplyr不知道如何将代码转化为SQL。
我还尝试在dbplyr中运行以下代码,这是我从Stack Overflow的另一个问题中找到的:
output = df %>%
mutate(Previous_1year = sql('DATEADD(y, -1, Start_Date)'))
但是,我收到以下错误,不确定为什么会出现无效的日期时间:
! nanodbc/nanodbc.cpp:1752: XX000: [Amazon][Amazon Redshift] (30)
Error occurred while trying to execute a query: [SQLState XX000] ERROR: Invalid datetime part for DATEADD().
由于数据量很大,我无法收集所有数据并通过dplyr进行数据处理,以从每个开始日期观测中减去1年。因此,希望首先在数据库中执行此操作,然后才能在收集结果之前进行计算。
英文:
I'm using dbplyr to access data in my database and I'm trying to calculate rolling sum amounts that would include all amounts from the previous 1 year. My Dates are in dttm format. I'm trying to get a table that shows 1 year prior to the Start_Date similar to below:
Start_Date | Previous_1year |
---|---|
2021-07-02 08:03:29 | 2020-07-02 08:03:29 |
2021-07-09 06:36:28 | 2020-07-09 06:36:28 |
2021-07-09 09:29:02 | 2020-07-09 09:29:02 |
2021-07-16 08:26:30 | 2020-07-16 08:26:30 |
2021-07-23 07:35:51 | 2020-07-23 07:35:51 |
2021-07-30 09:19:13 | 2020-07-30 09:19:13 |
I would like to subtract exactly 1 year from the Start_Date to get a rolling one-year window for each row/observation for rolling sum calculation.
I've done this in dplyr before and I would normally use the following code to get this:
output = df %>%
mutate(Previous_1year = Start_Date %m+% years(-1))
However, this method doesn't seem to be compatible with dbplyr and my database, assuming this is because dbplyr doesn't know how to translate the code back into SQL.
I've also tried running the following code in dbplyr which I found from another question in SO:
output = df %>%
mutate(Previous_1year = sql('DATEADD(y, -1, Start_Date)'))
but I'm getting the following error and not sure why I'm getting a invalid datetime:
! nanodbc/nanodbc.cpp:1752: XX000: [Amazon][Amazon Redshift] (30)
Error occurred while trying to execute a query: [SQLState XX000] ERROR: Invalid datetime part for DATEADD().
Because the data is so large, I'm unable to collect all of it and do the data wrangling through dplyr to subtract 1 year from each of my Start_Date observations. So, hoping to do this within the database first so then I can do my calculations before collecting the results.
答案1
得分: 1
这在Postgres上有效,所以可能在Redshift上也有效:
library(dplyr, warn.conflicts = FALSE)
library(DBI)
pg <- dbConnect(RPostgres::Postgres())
df <- tbl(pg, sql("SELECT '2021-07-02 08:03:29'::timestamp AS start"))
df %>%
mutate(previous = start - years(1))
#> # Source: SQL [1 x 2]
#> # Database: postgres [iangow@/tmp:5432/iangow]
#> start previous
#> <dttm> <dttm>
#> 1 2021-07-02 08:03:29 2020-07-02 08:03:29
Created on 2023-04-03 with reprex v2.0.2
英文:
This works on Postgres, so might work on Redshift:
library(dplyr, warn.conflicts = FALSE)
library(DBI)
pg <- dbConnect(RPostgres::Postgres())
df <- tbl(pg, sql("SELECT '2021-07-02 08:03:29'::timestamp AS start"))
df %>%
mutate(previous = start - years(1))
#> # Source: SQL [1 x 2]
#> # Database: postgres [iangow@/tmp:5432/iangow]
#> start previous
#> <dttm> <dttm>
#> 1 2021-07-02 08:03:29 2020-07-02 08:03:29
<sup>Created on 2023-04-03 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论