英文:
Filling in missing calendar dates in snowflake sql
问题
我有一个雪花表格 A,其中包括 ID、date_yyyymmdd、amount 和 hours,如下所示。并非所有的日历日期都有数据。
id | date_yyyymmdd | amount | hours |
---|---|---|---|
1 | 20230101 | 1428.95 | 11 |
1 | 23020103 | 1791.29 | 13 |
2 | 20230101 | 2516.84 | 15 |
2 | 20230105 | 3046.08 | 5 |
3 | 20230102 | 7137.92 | 11 |
3 | 20230103 | 1104.35 | 1 |
3 | 20230104 | 25 | 1 |
我想要填充两个变量 start_date 和 end_date 之间的缺失日历日期,并生成如下所示的表格 B,并将这些日期的 amount 和 hours 填充为 0。在下面的示例中,开始日期是 20230101,结束日期是 20230105。是否有可用的雪花日历函数来填充两个日期之间的缺失日期?
id | date_yyyymmdd | amount | hours |
---|---|---|---|
1 | 20230101 | 1428.95 | 11 |
1 | 20230102 | 0 | 0 |
1 | 20230103 | 1791.29 | 13 |
1 | 20230104 | 0 | 0 |
1 | 20230105 | 0 | 0 |
2 | 20230101 | 2516.84 | 15 |
2 | 20230102 | 0 | 0 |
2 | 20230103 | 0 | 0 |
2 | 20230104 | 0 | 0 |
2 | 20230105 | 3046.08 | 5 |
3 | 20230101 | 0 | 0 |
3 | 20230102 | 7137.92 | 11 |
3 | 20230103 | 1104.35 | 1 |
3 | 20230104 | 25 | 1 |
3 | 20230105 | 0 | 0 |
英文:
I have a snowflake table A with ID, date_yyyymmdd, amount and hours as shown below. Not all calendar dates are populated.
id | date_yyyymmdd | amount | hours |
---|---|---|---|
1 | 20230101 | 1428.95 | 11 |
1 | 23020103 | 1791.29 | 13 |
2 | 20230101 | 2516.84 | 15 |
2 | 20230105 | 3046.08 | 5 |
3 | 20230102 | 7137.92 | 11 |
3 | 20230103 | 1104.35 | 1 |
3 | 20230104 | 25 | 1 |
I would like to fill in missing calendar dates between two variables start_date and end_date and produce table B as shown below and populate amount and hours as 0s for those dates. In the example below the start date is 20230101 and end date is 20230105. Is there a snowflake calendar function available to fill in the missing dates between two dates?
id | date_yyyymmdd | amount | hours |
---|---|---|---|
1 | 20230101 | 1428.95 | 11 |
1 | 20230102 | 0 | 0 |
1 | 20230103 | 1791.29 | 13 |
1 | 20230104 | 0 | 0 |
1 | 20230105 | 0 | 0 |
2 | 20230101 | 2516.84 | 15 |
2 | 20230102 | 0 | 0 |
2 | 20230103 | 0 | 0 |
2 | 20230104 | 0 | 0 |
2 | 20230105 | 3046.08 | 5 |
3 | 20230101 | 0 | 0 |
3 | 20230102 | 7137.92 | 11 |
3 | 20230103 | 1104.35 | 1 |
3 | 20230104 | 25 | 1 |
3 | 20230105 | 0 | 0 |
答案1
得分: 1
以下是翻译好的部分:
"You can generate the date ranges for each id (excluding existing dates)..then stack missing dates on top of your existing data. There are 4 key steps involved in this. Here's a demo.."
"您可以为每个ID生成日期范围(不包括现有日期),然后将缺失的日期堆叠在现有数据之上。这涉及到4个关键步骤。这里是一个演示..."
Data
"数据"
create or replace temporary table t as
"创建或替换临时表t,其中"
with cte (id, dt, amt, hr) as
"使用cte(id,dt,amt,hr)作为"
(select 1,'2023-01-03'::date,100,5 union all
select 1,'2023-01-05'::date,130,6 union all
select 2,'2023-01-03'::date,160,1 union all
select 2,'2023-01-07'::date,100,4)
"(选择1,'2023-01-03' :: date,100,5联合全部
选择1,'2023-01-05' :: date,130,6联合全部
选择2,'2023-01-03' :: date,160,1联合全部
选择2,'2023-01-07' :: date,100,4)"
select *
from cte;
"从cte中选择*"
Code
"代码"
set (min_dt, range_dt) = (select min(dt), max(dt) - min(dt) from t);
"设置(min_dt,range_dt)=(从t中选择dt的最小值,max(dt) - min(dt))"
with missing_dates (id, dt) as
"使用missing_dates(id,dt)作为"
(select a.id, $min_dt+b.index -- 2. add row index to generate dates
from (select distinct id from t) a ,
lateral flatten(array_generate_range(0,$range_dt+1)) b -- 1. generate as many rows as date range
except -- 3. exclude dates that already exist
select id, dt
from t)
"(从(从t中选择id的不同值)a,
横向展开(array_generate_range(0,$range_dt+1))b - 1. 生成与日期范围一样多的行
除了 - 3. 排除已经存在的日期
从t中选择id,dt)"
select id, dt, amt, hr
from t
union all --4. stack them up
select id, dt, 0, 0
from missing_dates
order by id, dt;
"从t中选择id,dt,amt,hr
联合全部 - 4. 将它们堆叠起来
从missing_dates中选择id,dt,0,0
按id,dt排序;"
英文:
You can generate the date ranges for each id (excluding existing dates)..then stack missing dates on top of your existing data. There are 4 key steps involved in this. Here's a demo..
Data
create or replace temporary table t as
with cte (id, dt, amt, hr) as
(select 1,'2023-01-03'::date,100,5 union all
select 1,'2023-01-05'::date,130,6 union all
select 2,'2023-01-03'::date,160,1 union all
select 2,'2023-01-07'::date,100,4)
select *
from cte;
Code
set (min_dt, range_dt) = (select min(dt), max(dt) - min(dt) from t);
with missing_dates (id, dt) as
(select a.id, $min_dt+b.index -- 2. add row index to generate dates
from (select distinct id from t) a ,
lateral flatten(array_generate_range(0,$range_dt+1)) b -- 1. generate as many rows as date range
except -- 3. exclude dates that already exist
select id, dt
from t)
select id, dt, amt, hr
from t
union all --4. stack them up
select id, dt, 0, 0
from missing_dates
order by id, dt;
答案2
得分: 1
以下是翻译好的内容:
所以另一种方法是这样做,考虑到您提到了“有变量”用于起始/结束时间范围:
set start_date = '2023-01-01'::date;
set end_date = '2023-01-05'::date;
然后我们可以使用它们:
with fake_data(id, _date, amount, hours) as (
select * from values
(1, '2023-01-01'::date, 1428.95, 11),
(1, '2023-01-03'::date, 1791.29, 13),
(2, '2023-01-01'::date, 2516.84, 15),
(2, '2023-01-05'::date, 3046.08, 5),
(3, '2023-01-02'::date, 7137.92, 11),
(3, '2023-01-03'::date, 1104.35, 1),
(3, '2023-01-04'::date, 25, 1)
), date_range as (
select
dateadd('day',
row_number() over (order by null)-1,
$start_date -- 将您的开始日期变量放在这里
) as _date
from table(generator(rowcount=>1000))
qualify _date <= $end_date -- 将您的结束日期变量放在这里
), dist_ids as (
select distinct id from fake_data
)
select
i.id,
r._date as date_yyyymmdd,
zeroifnull(d.amount) as amount,
zeroifnull(d.hours) as hours
from dist_ids as i
cross join date_range as r
left join fake_data as d
on i.id = d.id
and r._date = d._date
order by 1,2
如果您还没有变量,您可以从数据中提取它们,就像其他人已经展示的那样,但是在一个步骤中像这样做:
with fake_data(id, _date, amount, hours) as (
select * from values
(1, '2023-01-01'::date, 1428.95, 11),
(1, '2023-01-03'::date, 1791.29, 13),
(2, '2023-01-01'::date, 2516.84, 15),
(2, '2023-01-05'::date, 3046.08, 5),
(3, '2023-01-02'::date, 7137.92, 11),
(3, '2023-01-03'::date, 1104.35, 1),
(3, '2023-01-04'::date, 25, 1)
), min_max as (
select min(_date) as min_d
,max(_date) - min_d + 1 as days
from fake_data
), date_range as (
select
dateadd('day', r.value, m.min_d ) as _date
from min_max as m,
lateral flatten(array_generate_range(0, m.days)) as r
), dist_ids as (
select distinct id from fake_data
)
select
i.id,
r._date as date_yyyymmdd,
zeroifnull(d.amount) as amount,
zeroifnull(d.hours) as hours
from dist_ids as i
cross join date_range as r
left join fake_data as d
on i.id = d.id
and r._date = d._date
order by 1,2
再次得到相同的结果。
英文:
So yet another way todo it, given you noted you "have varaibles" for the start/end time range:
set start_date = '2023-01-01'::date;
set end_date = '2023-01-05'::date;
then we can use them:
with fake_data(id, _date, amount, hours) as (
select * from values
(1, '2023-01-01'::date, 1428.95, 11),
(1, '2023-01-03'::date, 1791.29, 13),
(2, '2023-01-01'::date, 2516.84, 15),
(2, '2023-01-05'::date, 3046.08, 5),
(3, '2023-01-02'::date, 7137.92, 11),
(3, '2023-01-03'::date, 1104.35, 1),
(3, '2023-01-04'::date, 25, 1)
), date_range as (
select
dateadd('day',
row_number() over (order by null)-1,
$start_date -- Put you start date variable here
) as _date
from table(generator(rowcount=>1000))
qualify _date <= $end_date -- Put you end date variable here
), dist_ids as (
select distinct id from fake_data
)
select
i.id,
r._date as date_yyyymmdd,
zeroifnull(d.amount) as amount,
zeroifnull(d.hours) as hours
from dist_ids as i
cross join date_range as r
left join fake_data as d
on i.id = d.id
and r._date = d._date
order by 1,2
If you don't have variables already, you can extract them out of that data, like how the others have shown, but in one step like so:
with fake_data(id, _date, amount, hours) as (
select * from values
(1, '2023-01-01'::date, 1428.95, 11),
(1, '2023-01-03'::date, 1791.29, 13),
(2, '2023-01-01'::date, 2516.84, 15),
(2, '2023-01-05'::date, 3046.08, 5),
(3, '2023-01-02'::date, 7137.92, 11),
(3, '2023-01-03'::date, 1104.35, 1),
(3, '2023-01-04'::date, 25, 1)
), min_max as (
select min(_date) as min_d
,max(_date) - min_d + 1 as days
from fake_data
), date_range as (
select
dateadd('day', r.value, m.min_d ) as _date
from min_max as m,
lateral flatten(array_generate_range(0, m.days)) as r
), dist_ids as (
select distinct id from fake_data
)
select
i.id,
r._date as date_yyyymmdd,
zeroifnull(d.amount) as amount,
zeroifnull(d.hours) as hours
from dist_ids as i
cross join date_range as r
left join fake_data as d
on i.id = d.id
and r._date = d._date
order by 1,2
giving again, the same results.
答案3
得分: 0
Here is the translated content:
与@Radagast的回答非常相似,但这个可以包装在一个视图中。表t1将被替换为您的实际Snowflake表:
with t0 as (
select
'2023-01-01' as start_date,
'2023-01-05' as end_date,
datediff("days", start_date, end_date) + 1 range
) ,t1(id, date_yyyymmdd, amount, hours) as (
select * from values
(1, '2023-01-01'::date, 1428.95, 11),
(1, '2023-01-03'::date, 1791.29, 13),
(2, '2023-01-01'::date, 2516.84, 15),
(2, '2023-01-05'::date, 3046.08, 5),
(3, '2023-01-02'::date, 7137.92, 11),
(3, '2023-01-03'::date, 1104.35, 1),
(3, '2023-01-04'::date, 25, 1)
), t2 as (
select
distinct(t1.id),
dateadd("days", a.index, t0.start_date) date_yyyymmdd
from t0, t1,
lateral flatten(array_generate_range(0, t0.range)) a
)
select
t2.id,
t2.date_yyyymmdd,
coalesce(t1.amount, 0) amount,
coalesce(t1.hours, 0) hours
from t2 left join t1
on t2.id = t1.id
and t2.date_yyyymmdd = t1.date_yyyymmdd
order by 1,2
英文:
Very similar to the answer from @Radagast, but this one could be wrapped in a view. Table t1 would be replaced with your actual Snowflake table:
with t0 as (
select
'2023-01-01' as start_date,
'2023-01-05' as end_date,
datediff("days", start_date, end_date) + 1 range
) ,t1(id, date_yyyymmdd, amount, hours) as (
select * from values
(1, '2023-01-01'::date, 1428.95, 11),
(1, '2023-01-03'::date, 1791.29, 13),
(2, '2023-01-01'::date, 2516.84, 15),
(2, '2023-01-05'::date, 3046.08, 5),
(3, '2023-01-02'::date, 7137.92, 11),
(3, '2023-01-03'::date, 1104.35, 1),
(3, '2023-01-04'::date, 25, 1)
), t2 as (
select
distinct(t1.id),
dateadd("days", a.index, t0.start_date) date_yyyymmdd
from t0, t1,
lateral flatten(array_generate_range(0, t0.range)) a
)
select
t2.id,
t2.date_yyyymmdd,
coalesce(t1.amount, 0) amount,
coalesce(t1.hours, 0) hours
from t2 left join t1
on t2.id = t1.id
and t2.date_yyyymmdd = t1.date_yyyymmdd
order by 1,2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论