英文:
Is there a way to collapse ordered rows by terminal values with postgres window clause
问题
以下是代码的翻译部分:
我有一个名为foo的表:
| some_fk | some_field | some_date_field |
| ------- | ---------- | --------------- |
| 1 | A | 1990-01-01 |
| 1 | B | 1990-01-02 |
| 1 | C | 1990-03-01 |
| 1 | X | 1990-04-01 |
| 2 | B | 1990-01-01 |
| 2 | B | 1990-01-05 |
| 2 | Z | 1991-04-11 |
| 2 | C | 1992-01-01 |
| 2 | B | 1992-02-01 |
| 2 | Y | 1992-03-01 |
| 3 | C | 1990-01-01 |
`some_field` 有6个可能的值:`[A,B,C,X,Y,Z]`
其中 `[A,B,C]` 表示开启或继续事件,`[X,Y,Z]` 表示关闭事件。我该如何获取每个时间段在每个`some_fk`分区中的第一个开启事件和关闭事件之间的时间跨度,如下表所示:
| some_fk | some_date_field_start | some_date_field_end |
| ------- | --------------------- | ------------------- |
| 1 | 1990-01-01 | 1990-04-01 |
| 2 | 1990-01-01 | 1991-04-11 |
| 2 | 1992-01-01 | 1992-03-01 |
| 3 | 1990-01-01 | NULL |
*_请注意,非终止的时间跨度以NULL结束_
我确实有一个涉及3个公共表达式的解决方案,但我想知道是否有一种在PostgreSQL中不使用嵌套查询的更好/更优雅/更规范的方法。
我的方法类似于:
```sql
WITH ranked AS (
SELECT
RANK() OVER (PARTITION BY some_fk ORDER BY some_date_field) AS "rank"
some_fk,
some_field,
some_date_field
FROM foo
), openers AS (
SELECT * FROM ranked WHERE some_field IN ('A','B','C')
), closers AS (
SELECT
*,
LAG("rank") OVER (PARTITION BY some_fk ORDER BY "rank") AS rank_lag
FROM ranked WHERE some_field IN ('X','Y','Z')
)
SELECT DISTINCT
openers.some_fk,
FIRST_VALUE(openers.some_date_field) OVER (PARTITION BY some_fk ORDER BY "rank")
AS some_date_field_start,
closers.some_date_field AS some_date_field_end
FROM openers
JOIN closers
ON openers.some_fk = closers.some_fk
WHERE openers.some_date_field BETWEEN COALESCE(closers.rank_lag, 0) AND closers.rank
...但我觉得一定有更好的方法。
提前感谢您的帮助。
英文:
I have a table foo:
some_fk | some_field | some_date_field |
---|---|---|
1 | A | 1990-01-01 |
1 | B | 1990-01-02 |
1 | C | 1990-03-01 |
1 | X | 1990-04-01 |
2 | B | 1990-01-01 |
2 | B | 1990-01-05 |
2 | Z | 1991-04-11 |
2 | C | 1992-01-01 |
2 | B | 1992-02-01 |
2 | Y | 1992-03-01 |
3 | C | 1990-01-01 |
some_field
has 6 possible values: [A,B,C,X,Y,Z]
Where [A,B,C]
signify opening or continuation events and [X,Y,Z]
signify closing events. How do I get each span of time between the first opening event and closing event of each span, partitioned by some_fk
, as shown in the table below:
some_fk | some_date_field_start | some_date_field_end |
---|---|---|
1 | 1990-01-01 | 1990-04-01 |
2 | 1990-01-01 | 1991-04-11 |
2 | 1992-01-01 | 1992-03-01 |
3 | 1990-01-01 | NULL |
*Note that a non-terminated time span ends with NULL
I do have a solution that involves 3 common table expressions, but I'm wondering if there is a (better/more elegant/canonical) way to do this in PostgreSQL without nested queries.
My approach was something like:
WITH ranked AS (
SELECT
RANK() OVER (PARTITION BY some_fk ORDER BY some_date_field) AS "rank"
some_fk,
some_field,
some_date_field
FROM foo
), openers AS (
SELECT * FROM ranked WHERE some_field IN ('A','B','C')
), closers AS (
SELECT
*,
LAG("rank") OVER (PARTITION BY some_fk ORDER BY "rank") AS rank_lag
FROM ranked WHERE some_field IN ('X','Y','Z')
)
SELECT DISTINCT
openers.some_fk,
FIRST_VALUE(openers.some_date_field) OVER (PARTITION BY some_fk ORDER BY "rank")
AS some_date_field_start,
closers.some_date_field AS some_date_field_end
FROM openers
JOIN closers
ON openers.some_fk = closers.some_fk
WHERE openers.some_date_field BETWEEN COALESCE(closers.rank_lag, 0) AND closers.rank
... but I feel there must be a better way.
Thanks in advance for the help.
答案1
得分: 1
以下是翻译好的部分:
查询的基础是使用LAG
来确定前一条记录是否是封闭的。
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
这允许您从期望的结果中筛选出正确的4条记录,包括前两列;您的错误是将WHERE
子句直接放在查询中,而您想要的是将其用作子查询,并在主查询中编写WHERE
子句。从那一点开始,您有几种方式完成查询。
以下是使用标量子查询的版本:
SELECT some_fk, some_date_field AS some_date_field_start,
(
SELECT MIN(some_date_field)
FROM foo
WHERE some_fk = F.some_fk AND some_date_field > F.some_date_field AND some_field IN ('X','Y','Z')
) AS some_date_field_end
FROM (
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
) F
WHERE some_field IN ('A','B','C')
AND COALESCE(previous_some_field,'Z') IN ('X','Y','Z')
以下是使用CROSS JOIN LATERAL
的另一版本:
SELECT some_fk, some_date_field AS some_date_field_start, some_date_field_end
FROM (
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
) F1
CROSS JOIN LATERAL (
SELECT MIN(some_date_field) AS some_date_field_end
FROM foo
WHERE some_fk = F1.some_fk AND some_date_field > F1.some_date_field AND some_field IN ('X','Y','Z')
) F2
WHERE some_field IN ('A','B','C')
AND COALESCE(previous_some_field,'Z') IN ('X','Y','Z')
英文:
The basis of the query is to use LAG
to determine if the previous record was a closure.
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
This allows you to filter on the correct 4 records from your expected results, with the first 2 columns included; your mistake was to put the WHERE
clause onto that query directly, when what you want to do is use it as is in a sub-query and write the WHERE
in the main query.<br/>From that point, you have several possibilities to finish the query.
Here is a version using a scalar subquery:
SELECT some_fk, some_date_field AS some_date_field_start,
(
SELECT MIN(some_date_field)
FROM foo
WHERE some_fk = F.some_fk AND some_date_field > F.some_date_field AND some_field IN ('X','Y','Z')
) AS some_date_field_end
FROM (
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
) F
WHERE some_field IN ('A','B','C')
AND COALESCE(previous_some_field,'Z') IN ('X','Y','Z')
Here is another version using a CROSS JOIN LATERAL
:
SELECT some_fk, some_date_field AS some_date_field_start, some_date_field_end
FROM (
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
) F1
CROSS JOIN LATERAL (
SELECT MIN(some_date_field) AS some_date_field_end
FROM foo
WHERE some_fk = F1.some_fk AND some_date_field > F1.some_date_field AND some_field IN ('X','Y','Z')
) F2
WHERE some_field IN ('A','B','C')
AND COALESCE(previous_some_field,'Z') IN ('X','Y','Z')
答案2
得分: 1
另一种方法是通过创建累积的关闭事件总和来创建分组ID。然后在外部SQL中,您可以按组进行分组并选择最小和最大日期。
选择一些外键,min(some_date)作为some_date_field_start,max(some_date)作为some_date_field_end
从 (
选择一些外键,some_date,
Sum(Case When some_field in ('X','Y','Z') Then 1 Else 0 End)
Over (Partition By some_fk Order By some_date
Rows Between Unbounded Preceding And 1 Preceding)
as some_grouping
从 foo
)
按一些外键,some_grouping进行分组
按一些外键,some_grouping进行排序
这对我来说看起来简单一些。
英文:
Another approach is to create a grouping ID by creating a running sum of the closing events. Then in an outer SQL you can Group By and pick min() and max() dates.
Select some_fk,min(some_date) as some_date_field_start, max(some_date) as some_date_field _end
From (
Select some_fk,some_date,
Sum(Case When some_field in ('X','Y','Z') Then 1 Else 0 End)
Over (Partition By some_fk Order By some_date
Rows Between Unbounded Preceding And 1 Preceding)
as some_grouping
From foo
)
Group By some_fk,some_grouping
Order By some_fk,some_grouping
This seems a little simpler at least to me.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论