英文:
Snowflake / Snowpark fill gaps in between date ranges with current value
问题
在Snowflake中,您想要填充monthend_dates
之间的间隙,并分配new_ind
,直到它发生变化。以下是修改后的查询,以实现您的要求:
WITH min_max_date AS (
SELECT
min(sales_date) AS min_date,
max(sales_date) AS max_date
FROM sales
),
generated_dates AS (
SELECT
last_day(dateadd('month', row_number() over (ORDER BY null), min_date)) as _monthend
FROM TABLE(GENERATOR(ROWCOUNT => 100000))
CROSS JOIN min_max_date QUALIFY _monthend <= max_date
),
ind_history AS (
SELECT
id,
new_ind,
sales_date
FROM public.customer
)
SELECT
g._monthend AS monthend,
COALESCE(s.new_ind, lag(s.new_ind) OVER (PARTITION BY s.id ORDER BY s.sales_date)) AS new_ind
FROM generated_dates g
LEFT JOIN ind_history s ON g._monthend = last_day(s.sales_date)
ORDER BY g._monthend;
这个查询将生成的日期(_monthend
)与public.customer
中的数据进行连接,并使用COALESCE
和LAG
函数来填充new_ind
的空隙,直到它发生变化。最后,按_monthend
对结果进行排序。
英文:
I have this query in Snowflake:
WITH min_max_date AS (
SELECT
min(sales_date) AS min_date,
max(sales_date) AS max_date
FROM sales),
generated_dates AS (
SELECT
last_day(dateadd('month', row_number() over (ORDER BY seq4())-1, min_date)) as _monthend
FROM TABLE(GENERATOR(ROWCOUNT => 100000))
CROSS JOIN min_max_date QUALIFY _monthend <= max_date
)
select
g._monthend as monthend,
h.id,
h.new_ind
from generated_dates g
left join public.customer h on g._monthend = last_day(h.change_date)
order by _monthend;
Which returns
monthend_date | id | new_ind |
---|---|---|
2010-09-30 | 123 | 1 |
2011-03-31 | 123 | 2 |
2012-07-31 | 123 | 3 |
I would like to fill in the gaps between monthend_dates with all the generated _monthend, and assign with new_ind
until it changes.
So the output should look like:
monthend_date | id | new_ind |
---|---|---|
2010-09-30 | 123 | 1 |
2010-10-31 | 123 | 1 |
2010-11-30 | 123 | 1 |
2010-12-31 | 123 | 1 |
2011-01-31 | 123 | 1 |
2011-02-31 | 123 | 1 |
2011-03-31 | 123 | 2 |
I tried using a coalesce and lag function, but it didnt work out:
Update: using the query as suggested by the comments, this still returns NULL on the _monthend values, except when there is a direct match on a single _monthend value.
WITH min_max_date AS (
SELECT
min(sales_date) AS min_date,
max(sales_date) AS max_date
FROM sales),
generated_dates AS (
SELECT
last_day(dateadd('month', row_number() over (ORDER BY null), min_date)) as _monthend
FROM TABLE(GENERATOR(ROWCOUNT => 100000))
CROSS JOIN min_max_date QUALIFY _monthend <= max_date
),
ind_history AS (
SELECT
id,
new_ind,
sales_date
FROM public.customer
)
SELECT
g._monthend AS monthend,
COALESCE(s.new_ind,
lag(s.new_ind) OVER (PARTITION BY id ORDER BY sales_date)) AS new_ind
FROM public.customer s
LEFT JOIN ind_history s ON g._monthend = last_day(s.sales_date)
ORDER BY _monthend;
I would actually like to do have it as a Pyspark / Snowflake code, so not sure if that is easier to do
答案1
得分: 1
以下是翻译好的部分:
创建一些假数据以展示步骤:
```sql
WITH fake_sales_data as (
select
try_to_date(column1, 'yyyy-mm-dd') as sales_date
from values
('2010-09-30'),
('2011-03-31'),
('2012-07-31')
), fake_customer(change_date, id, new_ind) as (
select
try_to_date(column1, 'yyyy-mm-dd'), column2, column3
from values
('2010-09-21', 0, 1 ),
('2011-03-14', 1, 2 ),
('2011-06-15', 2, 3 )
)
然后我们可以将您的代码应用于此数据:
WITH fake_sales_data as (
select
try_to_date(column1, 'yyyy-mm-dd') as sales_date
from values
('2010-09-30'),
('2011-03-31'),
('2012-07-31')
), fake_customer(change_date, id, new_ind) as (
select
try_to_date(column1, 'yyyy-mm-dd'), column2, column3
from values
('2010-09-21', 0, 1 ),
('2011-03-14', 1, 2 ),
('2011-06-15', 2, 3 )
), min_max_date AS (
SELECT
min(sales_date) AS min_date,
max(sales_date) AS max_date
FROM fake_sales_data
), generated_dates AS (
SELECT
last_day(dateadd('month', ROW_NUMBER() OVER(ORDER BY TRUE)-1, min_date)) as _monthend
FROM TABLE(GENERATOR(ROWCOUNT => 100000))
CROSS JOIN min_max_date
QUALIFY _monthend <= max_date
)
select
g._monthend as monthend,
h.id,
h.new_ind
from generated_dates as g
left join fake_customer as h
on g._monthend = last_day(h.change_date)
order by _monthend;
并获取日期,但有缺失:
MONTHEND | ID | NEW_IND |
---|---|---|
2010-09-30 | 0 | 1 |
2010-10-31 | null | null |
2010-11-30 | null | null |
2010-12-31 | null | null |
2011-01-31 | null | null |
2011-02-28 | null | null |
2011-03-31 | 1 | 2 |
2011-04-30 | null | null |
2011-05-31 | null | null |
2011-06-30 | 2 | 3 |
2011-07-31 | null | null |
2011-08-31 | null | null |
2011-09-30 | null | null |
2011-10-31 | null | null |
2011-11-30 | null | null |
2011-12-31 | null | null |
2012-01-31 | null | null |
2012-02-29 | null | null |
2012-03-31 | null | null |
2012-04-30 | null | null |
2012-05-31 | null | null |
2012-06-30 | null | null |
2012-07-31 | null | null |
现在我们想要使用前面提到的LAG来填充先前的newid
:
只显示最后的选择:
select
g._monthend as monthend,
h.id,
NVL(h.new_ind, LAG(h.new_ind) ignore nulls over(order by g._monthend)) as new_ind
from generated_dates as g
left join fake_customer as h
on g._monthend = last_day(h.change_date)
order by _monthend;
结果如下:
MONTHEND | ID | NEW_IND |
---|---|---|
2010-09-30 | 0 | 1 |
2010-10-31 | null | 1 |
2010-11-30 | null | 1 |
2010-12-31 | null | 1 |
2011-01-31 | null | 1 |
2011-02-28 | null | 1 |
2011-03-31 | 1 | 2 |
2011-04-30 | null | 2 |
2011-05-31 | null | 2 |
2011-06-30 | 2 | 3 |
2011-07-31 | null | 3 |
2011-08-31 | null | 3 |
2011-09-30 | null | 3 |
2011-10-31 | null | 3 |
2011-11-30 | null | 3 |
2011-12-31 | null | 3 |
2012-01-31 | null | 3 |
2012-02-29 | null | 3 |
2012-03-31 | null | 3 |
2012-04-30 | null | 3 |
2012-05-31 | null | 3 |
2012-06-30 | null | 3 |
2012-07-31 | null | 3 |
<details>
<summary>英文:</summary>
so creating some fake data to show the steps:
WITH fake_sales_data as (
select
try_to_date(column1, 'yyyy-mm-dd') as sales_date
from values
('2010-09-30'),
('2011-03-31'),
('2012-07-31')
), fake_customer(change_date, id, new_ind) as (
select
try_to_date(column1, 'yyyy-mm-dd'), column2, column3
from values
('2010-09-21', 0, 1 ),
('2011-03-14', 1, 2 ),
('2011-06-15', 2, 3 )
)
we can then apply your code to this:
WITH fake_sales_data as (
select
try_to_date(column1, 'yyyy-mm-dd') as sales_date
from values
('2010-09-30'),
('2011-03-31'),
('2012-07-31')
), fake_customer(change_date, id, new_ind) as (
select
try_to_date(column1, 'yyyy-mm-dd'), column2, column3
from values
('2010-09-21', 0, 1 ),
('2011-03-14', 1, 2 ),
('2011-06-15', 2, 3 )
), min_max_date AS (
SELECT
min(sales_date) AS min_date,
max(sales_date) AS max_date
FROM fake_sales_data
), generated_dates AS (
SELECT
last_day(dateadd('month', ROW_NUMBER() OVER(ORDER BY TRUE)-1, min_date)) as _monthend
FROM TABLE(GENERATOR(ROWCOUNT => 100000))
CROSS JOIN min_max_date
QUALIFY _monthend <= max_date
)
select
g._monthend as monthend,
h.id,
h.new_ind
from generated_dates as g
left join fake_customer as h
on g._monthend = last_day(h.change_date)
order by _monthend;
and get the dates, yippie, but gaps:
MONTHEND |ID |NEW_IND
--|--|--
2010-09-30 |0 |1
2010-10-31 |null |null
2010-11-30 |null |null
2010-12-31 |null |null
2011-01-31 |null |null
2011-02-28 |null |null
2011-03-31 |1 |2
2011-04-30 |null |null
2011-05-31 |null |null
2011-06-30 |2 |3
2011-07-31 |null |null
2011-08-31 |null |null
2011-09-30 |null |null
2011-10-31 |null |null
2011-11-30 |null |null
2011-12-31 |null |null
2012-01-31 |null |null
2012-02-29 |null |null
2012-03-31 |null |null
2012-04-30 |null |null
2012-05-31 |null |null
2012-06-30 |null |null
2012-07-31 |null |null
now we want to fill in the prior `newid`'s with the before mentioned LAG
just showing the last select here:
select
g._monthend as monthend,
h.id,
NVL(h.new_ind, LAG(h.new_ind)ignore nulls over(order by g._monthend)) as new_ind
from generated_dates as g
left join fake_customer as h
on g._monthend = last_day(h.change_date)
order by _monthend;
gives:
MONTHEND |ID |NEW_IND
--|--|--
2010-09-30 |0 |1
2010-10-31 |null |1
2010-11-30 |null |1
2010-12-31 |null |1
2011-01-31 |null |1
2011-02-28 |null |1
2011-03-31 |1 |2
2011-04-30 |null |2
2011-05-31 |null |2
2011-06-30 |2 |3
2011-07-31 |null |3
2011-08-31 |null |3
2011-09-30 |null |3
2011-10-31 |null |3
2011-11-30 |null |3
2011-12-31 |null |3
2012-01-31 |null |3
2012-02-29 |null |3
2012-03-31 |null |3
2012-04-30 |null |3
2012-05-31 |null |3
2012-06-30 |null |3
2012-07-31 |null |3
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论