2023年8月4日 23:56:47go评论130阅读模式

英文:

Filling in missing calendar dates in snowflake sql

问题

我有一个雪花表格 A，其中包括 ID、date_yyyymmdd、amount 和 hours，如下所示。并非所有的日历日期都有数据。

id	date_yyyymmdd	amount	hours
1	20230101	1428.95	11
1	23020103	1791.29	13
2	20230101	2516.84	15
2	20230105	3046.08	5
3	20230102	7137.92	11
3	20230103	1104.35	1
3	20230104	25	1

我想要填充两个变量 start_date 和 end_date 之间的缺失日历日期，并生成如下所示的表格 B，并将这些日期的 amount 和 hours 填充为 0。在下面的示例中，开始日期是 20230101，结束日期是 20230105。是否有可用的雪花日历函数来填充两个日期之间的缺失日期？

id	date_yyyymmdd	amount	hours
1	20230101	1428.95	11
1	20230102	0	0
1	20230103	1791.29	13
1	20230104	0	0
1	20230105	0	0
2	20230101	2516.84	15
2	20230102	0	0
2	20230103	0	0
2	20230104	0	0
2	20230105	3046.08	5
3	20230101	0	0
3	20230102	7137.92	11
3	20230103	1104.35	1
3	20230104	25	1
3	20230105	0	0

英文:

I have a snowflake table A with ID, date_yyyymmdd, amount and hours as shown below. Not all calendar dates are populated.

id	date_yyyymmdd	amount	hours
1	20230101	1428.95	11
1	23020103	1791.29	13
2	20230101	2516.84	15
2	20230105	3046.08	5
3	20230102	7137.92	11
3	20230103	1104.35	1
3	20230104	25	1

I would like to fill in missing calendar dates between two variables start_date and end_date and produce table B as shown below and populate amount and hours as 0s for those dates. In the example below the start date is 20230101 and end date is 20230105. Is there a snowflake calendar function available to fill in the missing dates between two dates?

id	date_yyyymmdd	amount	hours
1	20230101	1428.95	11
1	20230102	0	0
1	20230103	1791.29	13
1	20230104	0	0
1	20230105	0	0
2	20230101	2516.84	15
2	20230102	0	0
2	20230103	0	0
2	20230104	0	0
2	20230105	3046.08	5
3	20230101	0	0
3	20230102	7137.92	11
3	20230103	1104.35	1
3	20230104	25	1
3	20230105	0	0

答案1

得分: 1

以下是翻译好的部分：

"You can generate the date ranges for each id (excluding existing dates)..then stack missing dates on top of your existing data. There are 4 key steps involved in this. Here's a demo.."

"您可以为每个ID生成日期范围（不包括现有日期），然后将缺失的日期堆叠在现有数据之上。这涉及到4个关键步骤。这里是一个演示..."

Data

"数据"

create or replace temporary table t as

"创建或替换临时表t，其中"

with cte (id, dt, amt, hr) as

"使用cte（id，dt，amt，hr）作为"

(select 1,'2023-01-03'::date,100,5 union all
select 1,'2023-01-05'::date,130,6 union all
select 2,'2023-01-03'::date,160,1 union all
select 2,'2023-01-07'::date,100,4)

"（选择1，'2023-01-03' :: date，100，5联合全部
选择1，'2023-01-05' :: date，130，6联合全部
选择2，'2023-01-03' :: date，160，1联合全部
选择2，'2023-01-07' :: date，100，4）"

select *
from cte;

"从cte中选择*"

Code

"代码"

set (min_dt, range_dt) = (select min(dt), max(dt) - min(dt) from t);

"设置（min_dt，range_dt）=（从t中选择dt的最小值，max(dt) - min(dt)）"

with missing_dates (id, dt) as

"使用missing_dates（id，dt）作为"

(select a.id, $min_dt+b.index -- 2. add row index to generate dates
from (select distinct id from t) a ,
lateral flatten(array_generate_range(0,$range_dt+1)) b -- 1. generate as many rows as date range
except -- 3. exclude dates that already exist
select id, dt
from t)

"（从（从t中选择id的不同值）a，
横向展开（array_generate_range(0，$range_dt+1)）b - 1. 生成与日期范围一样多的行
除了 - 3. 排除已经存在的日期
从t中选择id，dt）"

select id, dt, amt, hr
from t
union all --4. stack them up
select id, dt, 0, 0
from missing_dates
order by id, dt;

"从t中选择id，dt，amt，hr
联合全部 - 4. 将它们堆叠起来
从missing_dates中选择id，dt，0，0
按id，dt排序；"

英文:

You can generate the date ranges for each id (excluding existing dates)..then stack missing dates on top of your existing data. There are 4 key steps involved in this. Here's a demo..

Data

create or replace temporary table t as
with cte (id, dt, amt, hr) as
(select 1,&#39;2023-01-03&#39;::date,100,5 union all
 select 1,&#39;2023-01-05&#39;::date,130,6 union all
 select 2,&#39;2023-01-03&#39;::date,160,1 union all
 select 2,&#39;2023-01-07&#39;::date,100,4)
select * 
from cte;

Code

set (min_dt, range_dt) = (select min(dt), max(dt) - min(dt) from t);
with missing_dates (id, dt) as
(select a.id, $min_dt+b.index -- 2. add row index to generate dates
 from (select distinct id from t) a ,
 lateral flatten(array_generate_range(0,$range_dt+1)) b -- 1. generate as many rows as date range
 except -- 3. exclude dates that already exist
 select id, dt
 from t)
select id, dt, amt, hr
from t
union all --4. stack them up
select id, dt, 0, 0
from missing_dates
order by id, dt;

答案2

得分: 1

以下是翻译好的内容：

所以另一种方法是这样做，考虑到您提到了“有变量”用于起始/结束时间范围：

set start_date = '2023-01-01'::date;
set end_date = '2023-01-05'::date;

然后我们可以使用它们：

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, '2023-01-01'::date, 1428.95, 11),
        (1, '2023-01-03'::date, 1791.29, 13),
        (2, '2023-01-01'::date, 2516.84, 15),
        (2, '2023-01-05'::date, 3046.08, 5),
        (3, '2023-01-02'::date, 7137.92, 11),
        (3, '2023-01-03'::date, 1104.35, 1),
        (3, '2023-01-04'::date, 25, 1)
), date_range as (
    select 
        dateadd('day',
            row_number() over (order by null)-1,
            $start_date -- 将您的开始日期变量放在这里
        ) as _date 
    from table(generator(rowcount=>1000))
    qualify _date <= $end_date -- 将您的结束日期变量放在这里
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

如果您还没有变量，您可以从数据中提取它们，就像其他人已经展示的那样，但是在一个步骤中像这样做：

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, '2023-01-01'::date, 1428.95, 11),
        (1, '2023-01-03'::date, 1791.29, 13),
        (2, '2023-01-01'::date, 2516.84, 15),
        (2, '2023-01-05'::date, 3046.08, 5),
        (3, '2023-01-02'::date, 7137.92, 11),
        (3, '2023-01-03'::date, 1104.35, 1),
        (3, '2023-01-04'::date, 25, 1)
), min_max as (
    select  min(_date) as min_d
        ,max(_date) - min_d + 1 as days
    from fake_data
), date_range as (
    select 
        dateadd('day', r.value, m.min_d ) as _date 
    from min_max as m,
     lateral flatten(array_generate_range(0, m.days)) as r
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

再次得到相同的结果。

英文:

So yet another way todo it, given you noted you "have varaibles" for the start/end time range:

set start_date = &#39;2023-01-01&#39;::date;
set end_date = &#39;2023-01-05&#39;::date;

then we can use them:

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, &#39;2023-01-01&#39;::date, 1428.95, 11),
        (1, &#39;2023-01-03&#39;::date, 1791.29, 13),
        (2, &#39;2023-01-01&#39;::date, 2516.84, 15),
        (2, &#39;2023-01-05&#39;::date, 3046.08, 5),
        (3, &#39;2023-01-02&#39;::date, 7137.92, 11),
        (3, &#39;2023-01-03&#39;::date, 1104.35, 1),
        (3, &#39;2023-01-04&#39;::date, 25, 1)
), date_range as (
    select 
        dateadd(&#39;day&#39;,
            row_number() over (order by null)-1,
            $start_date -- Put you start date variable here
        ) as _date 
    from table(generator(rowcount=&gt;1000))
    qualify _date &lt;= $end_date -- Put you end date variable here
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

If you don't have variables already, you can extract them out of that data, like how the others have shown, but in one step like so:

with fake_data(id, _date, amount, hours) as (
    select * from values
        (1, &#39;2023-01-01&#39;::date, 1428.95, 11),
        (1, &#39;2023-01-03&#39;::date, 1791.29, 13),
        (2, &#39;2023-01-01&#39;::date, 2516.84, 15),
        (2, &#39;2023-01-05&#39;::date, 3046.08, 5),
        (3, &#39;2023-01-02&#39;::date, 7137.92, 11),
        (3, &#39;2023-01-03&#39;::date, 1104.35, 1),
        (3, &#39;2023-01-04&#39;::date, 25, 1)
), min_max as (
    select  min(_date) as min_d
        ,max(_date) - min_d + 1 as days
    from fake_data
), date_range as (
    select 
        dateadd(&#39;day&#39;, r.value, m.min_d ) as _date 
    from min_max as m,
     lateral flatten(array_generate_range(0, m.days)) as r
), dist_ids as (
    select distinct id from fake_data
)
select
    i.id,
    r._date as date_yyyymmdd,
    zeroifnull(d.amount) as amount,
    zeroifnull(d.hours) as hours
from dist_ids as i 
cross join date_range as r
left join fake_data as d
  on i.id = d.id 
      and r._date = d._date
order by 1,2

giving again, the same results.

答案3

得分: 0

Here is the translated content:

与@Radagast的回答非常相似，但这个可以包装在一个视图中。表t1将被替换为您的实际Snowflake表：

with t0 as (
    select
        '2023-01-01' as start_date,
        '2023-01-05' as end_date,
        datediff("days", start_date, end_date) + 1 range
) ,t1(id, date_yyyymmdd, amount, hours) as (
    select * from values
    (1, '2023-01-01'::date, 1428.95, 11),
    (1, '2023-01-03'::date, 1791.29, 13),
    (2, '2023-01-01'::date, 2516.84, 15),
    (2, '2023-01-05'::date, 3046.08, 5),
    (3, '2023-01-02'::date, 7137.92, 11),
    (3, '2023-01-03'::date, 1104.35, 1),
    (3, '2023-01-04'::date, 25, 1)
), t2 as (
    select
        distinct(t1.id),
        dateadd("days", a.index, t0.start_date) date_yyyymmdd
    from t0, t1,
    lateral flatten(array_generate_range(0, t0.range)) a
)
select
    t2.id,
    t2.date_yyyymmdd,
    coalesce(t1.amount, 0) amount,
    coalesce(t1.hours, 0) hours
from t2 left join t1
on t2.id = t1.id
and t2.date_yyyymmdd = t1.date_yyyymmdd
order by 1,2

英文:

Very similar to the answer from @Radagast, but this one could be wrapped in a view. Table t1 would be replaced with your actual Snowflake table:

with t0 as (
    select
        &#39;2023-01-01&#39; as start_date,
        &#39;2023-01-05&#39; as end_date,
        datediff(&quot;days&quot;, start_date, end_date) + 1 range
) ,t1(id, date_yyyymmdd, amount, hours) as (
    select * from values
    (1, &#39;2023-01-01&#39;::date, 1428.95, 11),
    (1, &#39;2023-01-03&#39;::date, 1791.29, 13),
    (2, &#39;2023-01-01&#39;::date, 2516.84, 15),
    (2, &#39;2023-01-05&#39;::date, 3046.08, 5),
    (3, &#39;2023-01-02&#39;::date, 7137.92, 11),
    (3, &#39;2023-01-03&#39;::date, 1104.35, 1),
    (3, &#39;2023-01-04&#39;::date, 25, 1)
), t2 as (
    select
        distinct(t1.id),
        dateadd(&quot;days&quot;, a.index, t0.start_date) date_yyyymmdd
    from t0, t1,
    lateral flatten(array_generate_range(0, t0.range)) a
)
select
    t2.id,
    t2.date_yyyymmdd,
    coalesce(t1.amount, 0) amount,
    coalesce(t1.hours, 0) hours
from t2 left join t1
on t2.id = t1.id
and t2.date_yyyymmdd = t1.date_yyyymmdd
order by 1,2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Snowflake SQL中填充缺失的日历日期。

问题

答案1

答案2

答案3

如何在SQL中合并列并进行转置？

database/sql和database/sql/driver中的命名参数

Oracle查询中使用ORDER BY子句的NULLS LAST选项未按预期工作。

合并 OffsetDateTime 时，当偏移为 UTC (+00:00) 时失败。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。