如何在SQL中使用日期间隔进行分区

huangapple go评论66阅读模式
英文:

How to do partition by using date intervals in SQL

问题

以下是翻译好的代码部分:

case when
interval_start - lag(interval_end) over (partition by account_id order by interval_start) as prev_interval_end > 30 then 1 else 0 end

请注意,这只是原始代码的一部分,您需要将其与其他查询一起使用,以便正确计算“reactivated”列中的值。如果您需要更多帮助,请提供完整的查询或具体的问题描述。

英文:

I have the following sample data with 3 columns, i.e., account_id, interval_start, interval_end. And I want to create a reactivated column with the given output in the following table.

For each account_id, reactivated column should start with a 0 value and then I want to check if the previous interval_end and current interval_start value is more than a month then I want to increment that value as in to 1 and then next time when previous interval_end and current interval_start is more than a month, then it should be 2 and so on.

account_id interval_start interval_end reactivated
ABC 2015-01-01 00:00:00.000 2015-02-01 00:00:00.000 0
ABC 2015-02-01 00:00:00.000 2015-03-01 00:00:00.000 0
ABC 2015-03-01 00:00:00.000 2015-04-01 00:00:00.000 0
ABC 2015-09-01 00:00:00.000 2015-10-01 00:00:00.000 1
ABC 2015-10-01 00:00:00.000 2015-11-01 00:00:00.000 1
ABC 2017-04-01 00:00:00.000 2017-05-01 00:00:00.000 2
ABC 2017-05-01 00:00:00.000 2017-06-01 00:00:00.000 2
ABC 2017-06-01 00:00:00.000 2017-07-01 00:00:00.000 2
ABC 2017-07-01 00:00:00.000 2017-08-01 00:00:00.000 2
ABC 2019-03-01 00:00:00.000 2019-04-01 00:00:00.000 3
ABC 2019-04-01 00:00:00.000 2019-05-01 00:00:00.000 3
ABC 2019-05-01 00:00:00.000 2019-06-01 00:00:00.000 3
ABC 2021-07-01 00:00:00.000 2021-08-01 00:00:00.000 4
ABC 2021-08-01 00:00:00.000 2021-09-01 00:00:00.000 4
DEF 2015-03-01 00:00:00.000 2015-04-01 00:00:00.000 0
DEF 2015-04-01 00:00:00.000 2015-05-01 00:00:00.000 0
DEF 2017-06-01 00:00:00.000 2017-07-01 00:00:00.000 1
DEF 2017-07-01 00:00:00.000 2017-08-01 00:00:00.000 1
DEF 2019-05-01 00:00:00.000 2019-06-01 00:00:00.000 2

So far I'm only able to get 0s and 1s in the reactivated column by doing the following but its not what I want:

case when
interval_start - lag(interval_end) over (partition by account_id order by interval_start) as prev_interval_end > 30 then 1 else 0 end

How do I get the value to increment as well for each account_id please.

答案1

得分: 1

Here's the translated code portion:

你想要一个连续重叠行的运行计数。

使用`lag()`你已经走在正确的轨道上;你只需要再加一层窗口函数来对这些标志进行`sum`操作:

select t.*,
    sum(case when interval_start > lag_interval_end then 1 else 0 end) grp
from (
    select t.*,
        lag(interval_end) over (partition by account_id order by interval_start) as lag_interval_end
    from mytable t
) t

这是标准的SQL,在大多数支持窗口函数的数据库上都可以工作。根据你的实际数据库,可能有更简洁的方式来表达窗口的求和,例如在Postgres中

count(*) filter(where interval_start > lag_interval_end) grp

或者在MySQL中

sum(interval_start > lag_interval_end) grp

Redshift中,我们需要一个`rows`子句:

sum(case when interval_start > lag_interval_end then 1 else 0 end) over (
    partition by account_id 
    order by interval_start 
    rows unbounded preceding
) as  grp

Please note that the code is now translated into Chinese, as per your request.

英文:

You want a running count of consecutive rows that overlap.

You are on the right track with lag(); you just need another level of windowing to sum those flags:

select t.*,
    sum(case when interval_start > lag_interval_end then 1 else 0 end) grp
from (
    select t.*,
        lag(interval_end) over (partition by account_id order by interval_start) as lag_interval_end
    from mytable t
) t

This is standard SQL, that should work on most databases that support window functions. Depending on your actual database, there might be neater ways to express the window sum, eg in Postgres:

count(*) filter(where interval_start > lag_interval_end) grp

Or in MySQL:

sum(interval_start > lag_interval_end) grp

In Redshift, we need a rows clause:

sum(case when interval_start > lag_interval_end then 1 else 0 end) over (
    partition by account_id 
    order by interval_start 
    rows unbounded preceding
) as  grp

答案2

得分: 0

在上面的答案中添加。由于我使用Redshift,对我有效的是:

sum(case when interval_start > lag_interval_end then 1 else 0 end) over (partition by account_id order by interval_start rows unbounded preceding) as grp
英文:

Adding to above answer. Since I'm using Redshift what worked for me is:

sum(case when interval_start > lag_interval_end then 1 else 0 end) over (partition by account_id order by interval_start rows unbounded preceding) as  grp

答案3

得分: 0

以下是您要翻译的内容:

解决方案 PostgreSQL

带有示例数据的查询

with dt as (
select * from (values
('ABC', DATE'2015-01-01'::timestamp,DATE'2015-02-01'::timestamp),
('ABC', DATE'2015-03-01',DATE'2015-04-01'),
('ABC', DATE'2015-09-01',DATE'2015-10-01'),
('ABC', DATE'2015-10-01',DATE'2015-11-01'),
('ABC', DATE'2017-04-01',DATE'2017-05-01'),
('ABC', DATE'2017-05-01',DATE'2017-06-01'),
('XXX', DATE'2015-01-01',DATE'2015-02-01')
) t(account_id, interval_start, interval_end)),
dt2 as (
select 
  account_id, interval_start, interval_end,
  case 
   interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
  when true then 1
  else 0 end delta_act
from  dt)
select 
 account_id, interval_start, interval_end, delta_act,
 sum(delta_act) over(partition by account_id order by interval_start) reactivated
from dt2
order by 1,2

子查询 dt2 根据您的一个月不活动规则计算 delta activation

case 
   interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
  when true then 1
  else 0 end delta_act

最后,您只需使用窗口函数 sum 这个 delta。

sum(delta_act) over(partition by account_id order by interval_start) reactivated

结果

account_id|interval_start     |interval_end       |delta_act|reactivated|
----------+-------------------+-------------------+---------+-----------+
ABC       |2015-01-01 00:00:00|2015-02-01 00:00:00|        0|          0|
ABC       |2015-03-01 00:00:00|2015-04-01 00:00:00|        0|          0|
ABC       |2015-09-01 00:00:00|2015-10-01 00:00:00|        1|          1|
ABC       |2015-10-01 00:00:00|2015-11-01 00:00:00|        0|          1|
ABC       |2017-04-01 00:00:00|2017-05-01 00:00:00|        1|          2|
ABC       |2017-05-01 00:00:00|2017-06-01 00:00:00|        0|          2|
XXX       |2015-01-01 00:00:00|2015-02-01 00:00:00|        0|          0|

请注意,这是您提供的查询和结果的中文翻译。

英文:

Solution for PostgreSQL

Query with sample data

with dt as (
select * from (values
('ABC', DATE'2015-01-01'::timestamp,DATE'2015-02-01'::timestamp),
('ABC', DATE'2015-03-01',DATE'2015-04-01'),
('ABC', DATE'2015-09-01',DATE'2015-10-01'),
('ABC', DATE'2015-10-01',DATE'2015-11-01'),
('ABC', DATE'2017-04-01',DATE'2017-05-01'),
('ABC', DATE'2017-05-01',DATE'2017-06-01'),
('XXX', DATE'2015-01-01',DATE'2015-02-01')
) t(account_id, interval_start, interval_end)),
dt2 as (
select 
  account_id, interval_start, interval_end,
  case 
   interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
  when true then 1
  else 0 end delta_act
from  dt)
select 
 account_id, interval_start, interval_end, delta_act,
 sum(delta_act) over(partition by account_id order by interval_start) reactivated
from dt2
order by 1,2

The subquery dt2 calculate the delta aktivation based on you one month inactivity rule.

  case 
   interval_start - lag(interval_end) over(partition by account_id order by interval_start) >= interval '1 month'
  when true then 1
  else 0 end delta_act  

Finally you must only sum this delta using the window function

 sum(delta_act) over(partition by account_id order by interval_start) reactivated

Result

account_id|interval_start     |interval_end       |delta_act|reactivated|
----------+-------------------+-------------------+---------+-----------+
ABC       |2015-01-01 00:00:00|2015-02-01 00:00:00|        0|          0|
ABC       |2015-03-01 00:00:00|2015-04-01 00:00:00|        0|          0|
ABC       |2015-09-01 00:00:00|2015-10-01 00:00:00|        1|          1|
ABC       |2015-10-01 00:00:00|2015-11-01 00:00:00|        0|          1|
ABC       |2017-04-01 00:00:00|2017-05-01 00:00:00|        1|          2|
ABC       |2017-05-01 00:00:00|2017-06-01 00:00:00|        0|          2|
XXX       |2015-01-01 00:00:00|2015-02-01 00:00:00|        0|          0|

huangapple
  • 本文由 发表于 2023年5月22日 23:04:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76307526.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定