英文:
FIRST_VALUE in Athena or Spark
问题
以下是你要的翻译内容:
select id
,id2
,FIRST_VALUE(CASE WHEN app THEN date0 ELSE NULL END) IGNORE NULLS OVER (PARTITION BY id ORDER BY date0) as date_result
from (
select 1 id, 22 as id2, false app, Date'2019-03-13' as date0
union
select 1 id, 23 as id2, true app, Date'2019-03-14' as date0
union
select 1 id, 23 as id2, true app, Date'2019-03-15' as date0
)
上述查询在Athena中返回如下:
id | id2 | date_result |
---|---|---|
1 | 22 | |
1 | 23 | 2019-03-14 |
1 | 23 | 2019-03-14 |
但我期望的结果如下,因为我们忽略了null值并按id对date_result进行分区:
id | id2 | date_result |
---|---|---|
1 | 22 | 2019-03-14 |
1 | 23 | 2019-03-14 |
1 | 23 | 2019-03-14 |
请问我在FIRST_VALUE中做错了什么?在Athena和Spark中实现这个结果的最佳方法是什么?谢谢。
英文:
select id
,id2
,FIRST_VALUE(CASE WHEN app THEN date0 ELSE NULL END) IGNORE NULLS OVER (PARTITION BY id ORDER BY date0) as date_result
from (
select 1 id, 22 as id2, false app, Date'2019-03-13' as date0
union
select 1 id, 23 as id2, true app, Date'2019-03-14' as date0
union
select 1 id, 23 as id2, true app, Date'2019-03-15' as date0
)
Above query is returning like below in Athena
id | id2 | date_result |
---|---|---|
1 | 22 | |
1 | 23 | 2019-03-14 |
1 | 23 | 2019-03-14 |
But I was expecting like below since we do ignore nulls and partition by id for date_result
id | id2 | date_result |
---|---|---|
1 | 22 | 2019-03-14 |
1 | 23 | 2019-03-14 |
1 | 23 | 2019-03-14 |
Could you please let me know what I am doing wrong in first_value? what is the best way to achieve this result in both Athena and spark? Thanks
I have added it in the description
答案1
得分: 0
> 你能告诉我在 first_value 函数中我做错了什么吗?
默认的窗口函数1的窗口范围是从无限制的前导行到当前行:
>> 如果没有指定 frame_end,将使用默认值 CURRENT ROW。
>> 如果未指定窗口范围,则使用默认窗口范围 RANGE UNBOUNDED PRECEDING。
如果你想要在整个分区内查找数值,你需要指定窗口范围,例如:
with dataset(id, id2, app, date0) as (
values (1, 22, false, Date'2019-03-13'),
(1, 23, true ,Date'2019-03-14'),
(1, 23, true ,Date'2019-03-15')
)
select id
, id2
, FIRST_VALUE(if(app, date0)) IGNORE NULLS
OVER (PARTITION BY id ORDER BY date0 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as date_result
from dataset;
英文:
> Could you please let me know what I am doing wrong in first_value?
default frame for windows functions is unbounded preceding - current row:
>> If frame_end is not specified, a default value of CURRENT ROW is used.
>> If no frame is specified, a default frame of RANGE UNBOUNDED PRECEDING is used.
If you want to find value across the whole partition you need to specify the frame, for example:
with dataset(id, id2, app, date0) as (
values (1, 22, false, Date'2019-03-13'),
(1, 23, true ,Date'2019-03-14'),
(1, 23, true ,Date'2019-03-15')
)
select id
, id2
, FIRST_VALUE(if(app, date0)) IGNORE NULLS
OVER (PARTITION BY id ORDER BY date0 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as date_result
from dataset;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论