英文:
How to do partition by using date intervals in SQL
问题
以下是翻译好的代码部分:
case when
interval_start - lag(interval_end) over (partition by account_id order by interval_start) as prev_interval_end > 30 then 1 else 0 end
请注意,这只是原始代码的一部分,您需要将其与其他查询一起使用,以便正确计算“reactivated”列中的值。如果您需要更多帮助,请提供完整的查询或具体的问题描述。
英文:
I have the following sample data with 3 columns, i.e., account_id, interval_start, interval_end. And I want to create a reactivated column with the given output in the following table.
For each account_id, reactivated column should start with a 0 value and then I want to check if the previous interval_end and current interval_start value is more than a month then I want to increment that value as in to 1 and then next time when previous interval_end and current interval_start is more than a month, then it should be 2 and so on.
account_id | interval_start | interval_end | reactivated |
---|---|---|---|
ABC | 2015-01-01 00:00:00.000 | 2015-02-01 00:00:00.000 | 0 |
ABC | 2015-02-01 00:00:00.000 | 2015-03-01 00:00:00.000 | 0 |
ABC | 2015-03-01 00:00:00.000 | 2015-04-01 00:00:00.000 | 0 |
ABC | 2015-09-01 00:00:00.000 | 2015-10-01 00:00:00.000 | 1 |
ABC | 2015-10-01 00:00:00.000 | 2015-11-01 00:00:00.000 | 1 |
ABC | 2017-04-01 00:00:00.000 | 2017-05-01 00:00:00.000 | 2 |
ABC | 2017-05-01 00:00:00.000 | 2017-06-01 00:00:00.000 | 2 |
ABC | 2017-06-01 00:00:00.000 | 2017-07-01 00:00:00.000 | 2 |
ABC | 2017-07-01 00:00:00.000 | 2017-08-01 00:00:00.000 | 2 |
ABC | 2019-03-01 00:00:00.000 | 2019-04-01 00:00:00.000 | 3 |
ABC | 2019-04-01 00:00:00.000 | 2019-05-01 00:00:00.000 | 3 |
ABC | 2019-05-01 00:00:00.000 | 2019-06-01 00:00:00.000 | 3 |
ABC | 2021-07-01 00:00:00.000 | 2021-08-01 00:00:00.000 | 4 |
ABC | 2021-08-01 00:00:00.000 | 2021-09-01 00:00:00.000 | 4 |
DEF | 2015-03-01 00:00:00.000 | 2015-04-01 00:00:00.000 | 0 |
DEF | 2015-04-01 00:00:00.000 | 2015-05-01 00:00:00.000 | 0 |
DEF | 2017-06-01 00:00:00.000 | 2017-07-01 00:00:00.000 | 1 |
DEF | 2017-07-01 00:00:00.000 | 2017-08-01 00:00:00.000 | 1 |
DEF | 2019-05-01 00:00:00.000 | 2019-06-01 00:00:00.000 | 2 |
So far I'm only able to get 0s and 1s in the reactivated column by doing the following but its not what I want:
case when
interval_start - lag(interval_end) over (partition by account_id order by interval_start) as prev_interval_end > 30 then 1 else 0 end
How do I get the value to increment as well for each account_id please.
答案1
得分: 1
Here's the translated code portion:
你想要一个连续重叠行的运行计数。
使用`lag()`你已经走在正确的轨道上;你只需要再加一层窗口函数来对这些标志进行`sum`操作:
select t.*,
sum(case when interval_start > lag_interval_end then 1 else 0 end) grp
from (
select t.*,
lag(interval_end) over (partition by account_id order by interval_start) as lag_interval_end
from mytable t
) t
这是标准的SQL,在大多数支持窗口函数的数据库上都可以工作。根据你的实际数据库,可能有更简洁的方式来表达窗口的求和,例如在Postgres中:
count(*) filter(where interval_start > lag_interval_end) grp
或者在MySQL中:
sum(interval_start > lag_interval_end) grp
在Redshift中,我们需要一个`rows`子句:
sum(case when interval_start > lag_interval_end then 1 else 0 end) over (
partition by account_id
order by interval_start
rows unbounded preceding
) as grp
Please note that the code is now translated into Chinese, as per your request.
英文:
You want a running count of consecutive rows that overlap.
You are on the right track with lag()
; you just need another level of windowing to sum
those flags:
select t.*,
sum(case when interval_start > lag_interval_end then 1 else 0 end) grp
from (
select t.*,
lag(interval_end) over (partition by account_id order by interval_start) as lag_interval_end
from mytable t
) t
This is standard SQL, that should work on most databases that support window functions. Depending on your actual database, there might be neater ways to express the window sum, eg in Postgres:
count(*) filter(where interval_start > lag_interval_end) grp
Or in MySQL:
sum(interval_start > lag_interval_end) grp
In Redshift, we need a rows
clause:
sum(case when interval_start > lag_interval_end then 1 else 0 end) over (
partition by account_id
order by interval_start
rows unbounded preceding
) as grp
答案2
得分: 0
在上面的答案中添加。由于我使用Redshift,对我有效的是:
sum(case when interval_start > lag_interval_end then 1 else 0 end) over (partition by account_id order by interval_start rows unbounded preceding) as grp
英文:
Adding to above answer. Since I'm using Redshift what worked for me is:
sum(case when interval_start > lag_interval_end then 1 else 0 end) over (partition by account_id order by interval_start rows unbounded preceding) as grp
答案3
得分: 0
以下是您要翻译的内容:
解决方案 PostgreSQL
带有示例数据的查询
with dt as (
select * from (values
('ABC', DATE'2015-01-01'::timestamp,DATE'2015-02-01'::timestamp),
('ABC', DATE'2015-03-01',DATE'2015-04-01'),
('ABC', DATE'2015-09-01',DATE'2015-10-01'),
('ABC', DATE'2015-10-01',DATE'2015-11-01'),
('ABC', DATE'2017-04-01',DATE'2017-05-01'),
('ABC', DATE'2017-05-01',DATE'2017-06-01'),
('XXX', DATE'2015-01-01',DATE'2015-02-01')
) t(account_id, interval_start, interval_end)),
dt2 as (
select
account_id, interval_start, interval_end,
case
interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
when true then 1
else 0 end delta_act
from dt)
select
account_id, interval_start, interval_end, delta_act,
sum(delta_act) over(partition by account_id order by interval_start) reactivated
from dt2
order by 1,2
子查询 dt2
根据您的一个月不活动规则计算 delta activation。
case
interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
when true then 1
else 0 end delta_act
最后,您只需使用窗口函数 sum 这个 delta。
sum(delta_act) over(partition by account_id order by interval_start) reactivated
结果
account_id|interval_start |interval_end |delta_act|reactivated|
----------+-------------------+-------------------+---------+-----------+
ABC |2015-01-01 00:00:00|2015-02-01 00:00:00| 0| 0|
ABC |2015-03-01 00:00:00|2015-04-01 00:00:00| 0| 0|
ABC |2015-09-01 00:00:00|2015-10-01 00:00:00| 1| 1|
ABC |2015-10-01 00:00:00|2015-11-01 00:00:00| 0| 1|
ABC |2017-04-01 00:00:00|2017-05-01 00:00:00| 1| 2|
ABC |2017-05-01 00:00:00|2017-06-01 00:00:00| 0| 2|
XXX |2015-01-01 00:00:00|2015-02-01 00:00:00| 0| 0|
请注意,这是您提供的查询和结果的中文翻译。
英文:
Solution for PostgreSQL
Query with sample data
with dt as (
select * from (values
('ABC', DATE'2015-01-01'::timestamp,DATE'2015-02-01'::timestamp),
('ABC', DATE'2015-03-01',DATE'2015-04-01'),
('ABC', DATE'2015-09-01',DATE'2015-10-01'),
('ABC', DATE'2015-10-01',DATE'2015-11-01'),
('ABC', DATE'2017-04-01',DATE'2017-05-01'),
('ABC', DATE'2017-05-01',DATE'2017-06-01'),
('XXX', DATE'2015-01-01',DATE'2015-02-01')
) t(account_id, interval_start, interval_end)),
dt2 as (
select
account_id, interval_start, interval_end,
case
interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
when true then 1
else 0 end delta_act
from dt)
select
account_id, interval_start, interval_end, delta_act,
sum(delta_act) over(partition by account_id order by interval_start) reactivated
from dt2
order by 1,2
The subquery dt2
calculate the delta aktivation based on you one month inactivity rule.
case
interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
when true then 1
else 0 end delta_act
Finally you must only sum this delta using the window function
sum(delta_act) over(partition by account_id order by interval_start) reactivated
Result
account_id|interval_start |interval_end |delta_act|reactivated|
----------+-------------------+-------------------+---------+-----------+
ABC |2015-01-01 00:00:00|2015-02-01 00:00:00| 0| 0|
ABC |2015-03-01 00:00:00|2015-04-01 00:00:00| 0| 0|
ABC |2015-09-01 00:00:00|2015-10-01 00:00:00| 1| 1|
ABC |2015-10-01 00:00:00|2015-11-01 00:00:00| 0| 1|
ABC |2017-04-01 00:00:00|2017-05-01 00:00:00| 1| 2|
ABC |2017-05-01 00:00:00|2017-06-01 00:00:00| 0| 2|
XXX |2015-01-01 00:00:00|2015-02-01 00:00:00| 0| 0|
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论