在Snowflake SQL中填充缺失的日历日期。

huangapple go评论78阅读模式
英文:

Filling in missing calendar dates in snowflake sql

问题

我有一个雪花表格 A,其中包括 ID、date_yyyymmdd、amount 和 hours,如下所示。并非所有的日历日期都有数据。

id date_yyyymmdd amount hours
1 20230101 1428.95 11
1 23020103 1791.29 13
2 20230101 2516.84 15
2 20230105 3046.08 5
3 20230102 7137.92 11
3 20230103 1104.35 1
3 20230104 25 1

我想要填充两个变量 start_date 和 end_date 之间的缺失日历日期,并生成如下所示的表格 B,并将这些日期的 amount 和 hours 填充为 0。在下面的示例中,开始日期是 20230101,结束日期是 20230105。是否有可用的雪花日历函数来填充两个日期之间的缺失日期?

id date_yyyymmdd amount hours
1 20230101 1428.95 11
1 20230102 0 0
1 20230103 1791.29 13
1 20230104 0 0
1 20230105 0 0
2 20230101 2516.84 15
2 20230102 0 0
2 20230103 0 0
2 20230104 0 0
2 20230105 3046.08 5
3 20230101 0 0
3 20230102 7137.92 11
3 20230103 1104.35 1
3 20230104 25 1
3 20230105 0 0
英文:

I have a snowflake table A with ID, date_yyyymmdd, amount and hours as shown below. Not all calendar dates are populated.

id date_yyyymmdd amount hours
1 20230101 1428.95 11
1 23020103 1791.29 13
2 20230101 2516.84 15
2 20230105 3046.08 5
3 20230102 7137.92 11
3 20230103 1104.35 1
3 20230104 25 1

I would like to fill in missing calendar dates between two variables start_date and end_date and produce table B as shown below and populate amount and hours as 0s for those dates. In the example below the start date is 20230101 and end date is 20230105. Is there a snowflake calendar function available to fill in the missing dates between two dates?

id date_yyyymmdd amount hours
1 20230101 1428.95 11
1 20230102 0 0
1 20230103 1791.29 13
1 20230104 0 0
1 20230105 0 0
2 20230101 2516.84 15
2 20230102 0 0
2 20230103 0 0
2 20230104 0 0
2 20230105 3046.08 5
3 20230101 0 0
3 20230102 7137.92 11
3 20230103 1104.35 1
3 20230104 25 1
3 20230105 0 0

答案1

得分: 1

以下是翻译好的部分:

"You can generate the date ranges for each id (excluding existing dates)..then stack missing dates on top of your existing data. There are 4 key steps involved in this. Here's a demo.."

"您可以为每个ID生成日期范围(不包括现有日期),然后将缺失的日期堆叠在现有数据之上。这涉及到4个关键步骤。这里是一个演示..."

Data

"数据"

create or replace temporary table t as

"创建或替换临时表t,其中"

with cte (id, dt, amt, hr) as

"使用cte(id,dt,amt,hr)作为"

(select 1,'2023-01-03'::date,100,5 union all
select 1,'2023-01-05'::date,130,6 union all
select 2,'2023-01-03'::date,160,1 union all
select 2,'2023-01-07'::date,100,4)

"(选择1,'2023-01-03' :: date,100,5联合全部
选择1,'2023-01-05' :: date,130,6联合全部
选择2,'2023-01-03' :: date,160,1联合全部
选择2,'2023-01-07' :: date,100,4)"

select *
from cte;

"从cte中选择*"

Code

"代码"

set (min_dt, range_dt) = (select min(dt), max(dt) - min(dt) from t);

"设置(min_dt,range_dt)=(从t中选择dt的最小值,max(dt) - min(dt))"

with missing_dates (id, dt) as

"使用missing_dates(id,dt)作为"

(select a.id, $min_dt+b.index -- 2. add row index to generate dates
from (select distinct id from t) a ,
lateral flatten(array_generate_range(0,$range_dt+1)) b -- 1. generate as many rows as date range
except -- 3. exclude dates that already exist
select id, dt
from t)

"(从(从t中选择id的不同值)a,
横向展开(array_generate_range(0,$range_dt+1))b - 1. 生成与日期范围一样多的行
除了 - 3. 排除已经存在的日期
从t中选择id,dt)"

select id, dt, amt, hr
from t
union all --4. stack them up
select id, dt, 0, 0
from missing_dates
order by id, dt;

"从t中选择id,dt,amt,hr
联合全部 - 4. 将它们堆叠起来
从missing_dates中选择id,dt,0,0
按id,dt排序;"

英文:

You can generate the date ranges for each id (excluding existing dates)..then stack missing dates on top of your existing data. There are 4 key steps involved in this. Here's a demo..

Data

create or replace temporary table t as

with cte (id, dt, amt, hr) as

(select 1,'2023-01-03'::date,100,5 union all
 select 1,'2023-01-05'::date,130,6 union all
 select 2,'2023-01-03'::date,160,1 union all
 select 2,'2023-01-07'::date,100,4)

select * 
from cte;

Code

set (min_dt, range_dt) = (select min(dt), max(dt) - min(dt) from t);

with missing_dates (id, dt) as

(select a.id, $min_dt+b.index -- 2. add row index to generate dates
 from (select distinct id from t) a ,
 lateral flatten(array_generate_range(0,$range_dt+1)) b -- 1. generate as many rows as date range
 except -- 3. exclude dates that already exist
 select id, dt
 from t)

select id, dt, amt, hr
from t
union all --4. stack them up
select id, dt, 0, 0
from missing_dates
order by id, dt;

答案2

得分: 1

以下是翻译好的内容:

所以另一种方法是这样做,考虑到您提到了“有变量”用于起始/结束时间范围:

set start_date = '2023-01-01'::date;
set end_date = '2023-01-05'::date;

然后我们可以使用它们:

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, '2023-01-01'::date, 1428.95, 11),
        (1, '2023-01-03'::date, 1791.29, 13),
        (2, '2023-01-01'::date, 2516.84, 15),
        (2, '2023-01-05'::date, 3046.08, 5),
        (3, '2023-01-02'::date, 7137.92, 11),
        (3, '2023-01-03'::date, 1104.35, 1),
        (3, '2023-01-04'::date, 25, 1)
), date_range as (
    select 
        dateadd('day',
            row_number() over (order by null)-1,
            $start_date -- 将您的开始日期变量放在这里
        ) as _date 
    from table(generator(rowcount=>1000))
    qualify _date <= $end_date -- 将您的结束日期变量放在这里
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

如果您还没有变量,您可以从数据中提取它们,就像其他人已经展示的那样,但是在一个步骤中像这样做:

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, '2023-01-01'::date, 1428.95, 11),
        (1, '2023-01-03'::date, 1791.29, 13),
        (2, '2023-01-01'::date, 2516.84, 15),
        (2, '2023-01-05'::date, 3046.08, 5),
        (3, '2023-01-02'::date, 7137.92, 11),
        (3, '2023-01-03'::date, 1104.35, 1),
        (3, '2023-01-04'::date, 25, 1)
), min_max as (
    select  min(_date) as min_d
        ,max(_date) - min_d + 1 as days
    from fake_data
), date_range as (
    select 
        dateadd('day', r.value, m.min_d ) as _date 
    from min_max as m,
     lateral flatten(array_generate_range(0, m.days)) as r
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

再次得到相同的结果。

英文:

So yet another way todo it, given you noted you "have varaibles" for the start/end time range:

set start_date = &#39;2023-01-01&#39;::date;
set end_date = &#39;2023-01-05&#39;::date;

then we can use them:

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, &#39;2023-01-01&#39;::date, 1428.95, 11),
        (1, &#39;2023-01-03&#39;::date, 1791.29, 13),
        (2, &#39;2023-01-01&#39;::date, 2516.84, 15),
        (2, &#39;2023-01-05&#39;::date, 3046.08, 5),
        (3, &#39;2023-01-02&#39;::date, 7137.92, 11),
        (3, &#39;2023-01-03&#39;::date, 1104.35, 1),
        (3, &#39;2023-01-04&#39;::date, 25, 1)
), date_range as (
    select 
        dateadd(&#39;day&#39;,
            row_number() over (order by null)-1,
            $start_date -- Put you start date variable here
        ) as _date 
    from table(generator(rowcount=&gt;1000))
    qualify _date &lt;= $end_date -- Put you end date variable here
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

在Snowflake SQL中填充缺失的日历日期。

If you don't have variables already, you can extract them out of that data, like how the others have shown, but in one step like so:

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, &#39;2023-01-01&#39;::date, 1428.95, 11),
        (1, &#39;2023-01-03&#39;::date, 1791.29, 13),
        (2, &#39;2023-01-01&#39;::date, 2516.84, 15),
        (2, &#39;2023-01-05&#39;::date, 3046.08, 5),
        (3, &#39;2023-01-02&#39;::date, 7137.92, 11),
        (3, &#39;2023-01-03&#39;::date, 1104.35, 1),
        (3, &#39;2023-01-04&#39;::date, 25, 1)
), min_max as (
    select  min(_date) as min_d
        ,max(_date) - min_d + 1 as days
    from fake_data
), date_range as (
    select 
        dateadd(&#39;day&#39;, r.value, m.min_d ) as _date 
    from min_max as m,
     lateral flatten(array_generate_range(0, m.days)) as r
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

giving again, the same results.

答案3

得分: 0

Here is the translated content:

@Radagast的回答非常相似,但这个可以包装在一个视图中。表t1将被替换为您的实际Snowflake表:

with t0 as (
    select
        '2023-01-01' as start_date,
        '2023-01-05' as end_date,
        datediff("days", start_date, end_date) + 1 range
) ,t1(id, date_yyyymmdd, amount, hours) as (
    select * from values
    (1, '2023-01-01'::date, 1428.95, 11),
    (1, '2023-01-03'::date, 1791.29, 13),
    (2, '2023-01-01'::date, 2516.84, 15),
    (2, '2023-01-05'::date, 3046.08, 5),
    (3, '2023-01-02'::date, 7137.92, 11),
    (3, '2023-01-03'::date, 1104.35, 1),
    (3, '2023-01-04'::date, 25, 1)
), t2 as (
    select
        distinct(t1.id),
        dateadd("days", a.index, t0.start_date) date_yyyymmdd
    from t0, t1,
    lateral flatten(array_generate_range(0, t0.range)) a
)
select
    t2.id,
    t2.date_yyyymmdd,
    coalesce(t1.amount, 0) amount,
    coalesce(t1.hours, 0) hours
from t2 left join t1
on t2.id = t1.id
and t2.date_yyyymmdd = t1.date_yyyymmdd
order by 1,2
英文:

Very similar to the answer from @Radagast, but this one could be wrapped in a view. Table t1 would be replaced with your actual Snowflake table:

with t0 as (
    select
        &#39;2023-01-01&#39; as start_date,
        &#39;2023-01-05&#39; as end_date,
        datediff(&quot;days&quot;, start_date, end_date) + 1 range
) ,t1(id, date_yyyymmdd, amount, hours) as (
    select * from values
    (1, &#39;2023-01-01&#39;::date, 1428.95, 11),
    (1, &#39;2023-01-03&#39;::date, 1791.29, 13),
    (2, &#39;2023-01-01&#39;::date, 2516.84, 15),
    (2, &#39;2023-01-05&#39;::date, 3046.08, 5),
    (3, &#39;2023-01-02&#39;::date, 7137.92, 11),
    (3, &#39;2023-01-03&#39;::date, 1104.35, 1),
    (3, &#39;2023-01-04&#39;::date, 25, 1)
), t2 as (
    select
        distinct(t1.id),
        dateadd(&quot;days&quot;, a.index, t0.start_date) date_yyyymmdd
    from t0, t1,
    lateral flatten(array_generate_range(0, t0.range)) a
)
select
    t2.id,
    t2.date_yyyymmdd,
    coalesce(t1.amount, 0) amount,
    coalesce(t1.hours, 0) hours
from t2 left join t1
on t2.id = t1.id
and t2.date_yyyymmdd = t1.date_yyyymmdd
order by 1,2

huangapple
  • 本文由 发表于 2023年8月4日 23:56:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76837494.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定