Snowflake / Snowpark 填充日期范围之间的间隙与当前值

huangapple go评论78阅读模式
英文:

Snowflake / Snowpark fill gaps in between date ranges with current value

问题

在Snowflake中,您想要填充monthend_dates之间的间隙,并分配new_ind,直到它发生变化。以下是修改后的查询,以实现您的要求:

WITH min_max_date AS (
    SELECT
        min(sales_date) AS min_date,
        max(sales_date) AS max_date
    FROM sales
),
generated_dates AS (
    SELECT
        last_day(dateadd('month', row_number() over (ORDER BY null), min_date)) as _monthend
    FROM TABLE(GENERATOR(ROWCOUNT => 100000))
    CROSS JOIN min_max_date QUALIFY _monthend <= max_date
),
ind_history AS (
    SELECT
        id,
        new_ind,
        sales_date
    FROM public.customer
)
SELECT
    g._monthend AS monthend,
    COALESCE(s.new_ind, lag(s.new_ind) OVER (PARTITION BY s.id ORDER BY s.sales_date)) AS new_ind
FROM generated_dates g
LEFT JOIN ind_history s ON g._monthend = last_day(s.sales_date)
ORDER BY g._monthend;

这个查询将生成的日期(_monthend)与public.customer中的数据进行连接,并使用COALESCELAG函数来填充new_ind的空隙,直到它发生变化。最后,按_monthend对结果进行排序。

英文:

I have this query in Snowflake:

WITH min_max_date AS (
SELECT
    min(sales_date) AS min_date,
    max(sales_date) AS max_date
FROM sales),
generated_dates AS (
    SELECT
        last_day(dateadd(&#39;month&#39;, row_number() over (ORDER BY seq4())-1, min_date)) as _monthend
    FROM TABLE(GENERATOR(ROWCOUNT =&gt; 100000))
    CROSS JOIN min_max_date QUALIFY _monthend &lt;= max_date
)
select
g._monthend as monthend,
h.id,
h.new_ind
from generated_dates g
left join public.customer h on g._monthend = last_day(h.change_date)
order by _monthend;

Which returns

monthend_date id new_ind
2010-09-30 123 1
2011-03-31 123 2
2012-07-31 123 3

I would like to fill in the gaps between monthend_dates with all the generated _monthend, and assign with new_ind until it changes.

So the output should look like:

monthend_date id new_ind
2010-09-30 123 1
2010-10-31 123 1
2010-11-30 123 1
2010-12-31 123 1
2011-01-31 123 1
2011-02-31 123 1
2011-03-31 123 2

I tried using a coalesce and lag function, but it didnt work out:

Update: using the query as suggested by the comments, this still returns NULL on the _monthend values, except when there is a direct match on a single _monthend value.

WITH min_max_date AS (
SELECT
    min(sales_date) AS min_date,
    max(sales_date) AS max_date
FROM sales),
generated_dates AS (
    SELECT
        last_day(dateadd(&#39;month&#39;, row_number() over (ORDER BY null), min_date)) as _monthend
    FROM TABLE(GENERATOR(ROWCOUNT =&gt; 100000))
    CROSS JOIN min_max_date QUALIFY _monthend &lt;= max_date
),
ind_history AS (
    SELECT
        id,
        new_ind,
        sales_date
    FROM public.customer
)
SELECT
    g._monthend AS monthend,
    COALESCE(s.new_ind,
              lag(s.new_ind) OVER (PARTITION BY id ORDER BY sales_date)) AS new_ind
FROM public.customer s
LEFT JOIN ind_history s ON g._monthend = last_day(s.sales_date)
ORDER BY _monthend;

I would actually like to do have it as a Pyspark / Snowflake code, so not sure if that is easier to do

答案1

得分: 1

以下是翻译好的部分:

创建一些假数据以展示步骤:

```sql
WITH fake_sales_data as (
    select 
        try_to_date(column1, 'yyyy-mm-dd') as sales_date 
    from values
        ('2010-09-30'),
        ('2011-03-31'),
        ('2012-07-31') 
), fake_customer(change_date, id, new_ind) as (
    select 
        try_to_date(column1, 'yyyy-mm-dd'), column2, column3 
    from values
        ('2010-09-21', 0, 1 ),
        ('2011-03-14', 1, 2 ),
        ('2011-06-15', 2, 3 )
)

然后我们可以将您的代码应用于此数据:

WITH fake_sales_data as (
    select 
        try_to_date(column1, 'yyyy-mm-dd') as sales_date 
    from values
        ('2010-09-30'),
        ('2011-03-31'),
        ('2012-07-31') 
), fake_customer(change_date, id, new_ind) as (
    select 
        try_to_date(column1, 'yyyy-mm-dd'), column2, column3 
    from values
        ('2010-09-21', 0, 1 ),
        ('2011-03-14', 1, 2 ),
        ('2011-06-15', 2, 3 )
), min_max_date AS (
    SELECT
        min(sales_date) AS min_date,
        max(sales_date) AS max_date
    FROM fake_sales_data
), generated_dates AS (
    SELECT
        last_day(dateadd('month', ROW_NUMBER() OVER(ORDER BY TRUE)-1, min_date)) as _monthend
    FROM TABLE(GENERATOR(ROWCOUNT => 100000))
    CROSS JOIN min_max_date 
    QUALIFY _monthend <= max_date
)
select
g._monthend as monthend,
h.id,
h.new_ind
from generated_dates as g
left join fake_customer as h 
    on g._monthend = last_day(h.change_date)
order by _monthend;

并获取日期,但有缺失:

MONTHEND ID NEW_IND
2010-09-30 0 1
2010-10-31 null null
2010-11-30 null null
2010-12-31 null null
2011-01-31 null null
2011-02-28 null null
2011-03-31 1 2
2011-04-30 null null
2011-05-31 null null
2011-06-30 2 3
2011-07-31 null null
2011-08-31 null null
2011-09-30 null null
2011-10-31 null null
2011-11-30 null null
2011-12-31 null null
2012-01-31 null null
2012-02-29 null null
2012-03-31 null null
2012-04-30 null null
2012-05-31 null null
2012-06-30 null null
2012-07-31 null null

现在我们想要使用前面提到的LAG来填充先前的newid

只显示最后的选择:

select
g._monthend as monthend,
h.id,
NVL(h.new_ind, LAG(h.new_ind) ignore nulls over(order by g._monthend)) as new_ind
from generated_dates as g
left join fake_customer as h 
    on g._monthend = last_day(h.change_date)
order by _monthend;

结果如下:

MONTHEND ID NEW_IND
2010-09-30 0 1
2010-10-31 null 1
2010-11-30 null 1
2010-12-31 null 1
2011-01-31 null 1
2011-02-28 null 1
2011-03-31 1 2
2011-04-30 null 2
2011-05-31 null 2
2011-06-30 2 3
2011-07-31 null 3
2011-08-31 null 3
2011-09-30 null 3
2011-10-31 null 3
2011-11-30 null 3
2011-12-31 null 3
2012-01-31 null 3
2012-02-29 null 3
2012-03-31 null 3
2012-04-30 null 3
2012-05-31 null 3
2012-06-30 null 3
2012-07-31 null 3

<details>
<summary>英文:</summary>

so creating some fake data to show the steps:

WITH fake_sales_data as (
select
try_to_date(column1, 'yyyy-mm-dd') as sales_date
from values
('2010-09-30'),
('2011-03-31'),
('2012-07-31')
), fake_customer(change_date, id, new_ind) as (
select
try_to_date(column1, 'yyyy-mm-dd'), column2, column3
from values
('2010-09-21', 0, 1 ),
('2011-03-14', 1, 2 ),
('2011-06-15', 2, 3 )
)


we can then apply your code to this:

WITH fake_sales_data as (
select
try_to_date(column1, 'yyyy-mm-dd') as sales_date
from values
('2010-09-30'),
('2011-03-31'),
('2012-07-31')
), fake_customer(change_date, id, new_ind) as (
select
try_to_date(column1, 'yyyy-mm-dd'), column2, column3
from values
('2010-09-21', 0, 1 ),
('2011-03-14', 1, 2 ),
('2011-06-15', 2, 3 )
), min_max_date AS (
SELECT
min(sales_date) AS min_date,
max(sales_date) AS max_date
FROM fake_sales_data
), generated_dates AS (
SELECT
last_day(dateadd('month', ROW_NUMBER() OVER(ORDER BY TRUE)-1, min_date)) as _monthend
FROM TABLE(GENERATOR(ROWCOUNT => 100000))
CROSS JOIN min_max_date
QUALIFY _monthend <= max_date
)
select
g._monthend as monthend,
h.id,
h.new_ind
from generated_dates as g
left join fake_customer as h
on g._monthend = last_day(h.change_date)
order by _monthend;


and get the dates, yippie, but gaps:

MONTHEND |ID |NEW_IND
--|--|--
2010-09-30 |0 |1
2010-10-31 |null |null
2010-11-30 |null |null
2010-12-31 |null |null
2011-01-31 |null |null
2011-02-28 |null |null
2011-03-31 |1 |2
2011-04-30 |null |null
2011-05-31 |null |null
2011-06-30 |2 |3
2011-07-31 |null |null
2011-08-31 |null |null
2011-09-30 |null |null
2011-10-31 |null |null
2011-11-30 |null |null
2011-12-31 |null |null
2012-01-31 |null |null
2012-02-29 |null |null
2012-03-31 |null |null
2012-04-30 |null |null
2012-05-31 |null |null
2012-06-30 |null |null
2012-07-31 |null |null

now we want to fill in the prior `newid`&#39;s with the before mentioned LAG

just showing the last select here:

select
g._monthend as monthend,
h.id,
NVL(h.new_ind, LAG(h.new_ind)ignore nulls over(order by g._monthend)) as new_ind
from generated_dates as g
left join fake_customer as h
on g._monthend = last_day(h.change_date)
order by _monthend;


gives:

MONTHEND |ID |NEW_IND
--|--|--
2010-09-30 |0 |1
2010-10-31 |null |1
2010-11-30 |null |1
2010-12-31 |null |1
2011-01-31 |null |1
2011-02-28 |null |1
2011-03-31 |1 |2
2011-04-30 |null |2
2011-05-31 |null |2
2011-06-30 |2 |3
2011-07-31 |null |3
2011-08-31 |null |3
2011-09-30 |null |3
2011-10-31 |null |3
2011-11-30 |null |3
2011-12-31 |null |3
2012-01-31 |null |3
2012-02-29 |null |3
2012-03-31 |null |3
2012-04-30 |null |3
2012-05-31 |null |3
2012-06-30 |null |3
2012-07-31 |null |3


</details>



huangapple
  • 本文由 发表于 2023年6月22日 08:44:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76527988.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定