2023年2月10日 04:48:04go评论95阅读模式

英文:

FIRST_VALUE in Athena or Spark

问题

以下是你要的翻译内容：

select id
      ,id2
      ,FIRST_VALUE(CASE WHEN app THEN date0 ELSE NULL END) IGNORE NULLS  OVER (PARTITION BY id ORDER BY date0) as date_result
from (
select 1 id, 22 as id2, false app, Date&#39;2019-03-13&#39; as date0
union
select 1 id, 23 as id2, true app, Date&#39;2019-03-14&#39; as date0
union
select 1 id, 23 as id2, true app, Date&#39;2019-03-15&#39; as date0
)

上述查询在Athena中返回如下：

id	id2	date_result
1	22
1	23	2019-03-14
1	23	2019-03-14

但我期望的结果如下，因为我们忽略了null值并按id对date_result进行分区：

id	id2	date_result
1	22	2019-03-14
1	23	2019-03-14
1	23	2019-03-14

请问我在FIRST_VALUE中做错了什么？在Athena和Spark中实现这个结果的最佳方法是什么？谢谢。

英文:

select id
      ,id2
      ,FIRST_VALUE(CASE WHEN app THEN date0 ELSE NULL END) IGNORE NULLS  OVER (PARTITION BY id ORDER BY date0) as date_result
from (
select 1 id, 22 as id2, false app, Date&#39;2019-03-13&#39; as date0
union
select 1 id, 23 as id2, true app, Date&#39;2019-03-14&#39; as date0
union
select 1 id, 23 as id2, true app, Date&#39;2019-03-15&#39; as date0
)

Above query is returning like below in Athena

id	id2	date_result
1	22
1	23	2019-03-14
1	23	2019-03-14

But I was expecting like below since we do ignore nulls and partition by id for date_result

id	id2	date_result
1	22	2019-03-14
1	23	2019-03-14
1	23	2019-03-14

Could you please let me know what I am doing wrong in first_value? what is the best way to achieve this result in both Athena and spark? Thanks

I have added it in the description

答案1

得分: 0

> 你能告诉我在 first_value 函数中我做错了什么吗？

默认的窗口函数1的窗口范围是从无限制的前导行到当前行：

>> 如果没有指定 frame_end，将使用默认值 CURRENT ROW。

>> 如果未指定窗口范围，则使用默认窗口范围 RANGE UNBOUNDED PRECEDING。

如果你想要在整个分区内查找数值，你需要指定窗口范围，例如：

with dataset(id, id2, app, date0) as (
    values (1, 22, false, Date&#39;2019-03-13&#39;),
    (1, 23, true ,Date&#39;2019-03-14&#39;),
    (1, 23, true ,Date&#39;2019-03-15&#39;)
)
select id
     , id2
     , FIRST_VALUE(if(app, date0)) IGNORE NULLS
        OVER (PARTITION BY id ORDER BY date0 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as date_result
from dataset;

英文:

> Could you please let me know what I am doing wrong in first_value?

default frame for windows functions is unbounded preceding - current row:

>> If frame_end is not specified, a default value of CURRENT ROW is used.

>> If no frame is specified, a default frame of RANGE UNBOUNDED PRECEDING is used.

If you want to find value across the whole partition you need to specify the frame, for example:

with dataset(id, id2, app, date0) as (
    values (1, 22, false, Date&#39;2019-03-13&#39;),
    (1, 23, true ,Date&#39;2019-03-14&#39;),
    (1, 23, true ,Date&#39;2019-03-15&#39;)
)
select id
     , id2
     , FIRST_VALUE(if(app, date0)) IGNORE NULLS
        OVER (PARTITION BY id ORDER BY date0 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as date_result
from dataset;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Athena或Spark中的”FIRST_VALUE”

问题

答案1

Oracle SQL查询 – 两个时间戳之间

连接到下一行

如果在SQL中使用公共表达式（CTE）的条件

对于每个不同的ID采用不同的值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。