英文:
How to partition a table based on a pattern(in this case an increasing sub sequence) that is found in a column in presto sql
问题
我遇到了以下问题。我有一个表T,在其中有一个名为offset的列,它将始终是一个以零开头的递增整数子序列集合。
表 T:
**行号, offset, col_1, col_2, col_3 ....**
1 0
2 10
3 22
4 32
5 0
6 0
7 12
8 32
9 44
我想根据offset列对表进行分区,使属于每个严格递增子序列的行放入一个分区。在上面的示例中,行[1,2,3,4],[0],[0,12,32,44]是我正在寻找的分区。我该如何继续,或者在SQL中是否可能实现这个?我尝试使用lead和lag分区函数,想法是创建一个辅助列,存储分区号,然后根据这个列对结果表进行分组。但我认为这样的努力可能需要循环功能,而我不认为SQL中存在这样的功能。
英文:
I am encountering the following problem. I have a table T, in which there is column named offset which will always be a collection of increasing sub sequences of integers beginning with zero.
Table T:
**row_num, offset ,col_1, col_2, col_3 ....**
1 0
2 10
3 22
4 32
5 0
6 0
7 12
8 32
9 44
I want to partition the table based on offset column such that the rows belonging to each strictly increasing subsequence are put into one partition. In the above example rows [1,2,3,4] , [0], [0,12,32,44] are the partitions that i am looking for. How do i proceed or is this even possible in sql?
I tried using lead and lag partition functions, the idea was to come up with an auxiliary column which stores partition numbers and then group by the resulting table based on this column. But i believe such an effort would require the loop functionality which i don't think exists in sql.
答案1
得分: 4
您可以使用 MATCH_RECOGNIZE
来识别满足某种模式的行。在您的情况下,您可以定义一个递增值 offset
的模式:
WITH t(row_num, offset) AS (
VALUES
(1, 0),
(2, 10),
(3, 22),
(4, 32),
(5, 0),
(6, 0),
(7, 12),
(8, 32),
(9, 44)
)
SELECT row_num, offset, partition FROM t
MATCH_RECOGNIZE (
ORDER BY row_num
MEASURES MATCH_NUMBER() AS partition
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START UP*)
DEFINE UP AS offset > PREV(offset)
)
上面查询中的 MATCH_RECOGNIZE
子句的含义是:
- 查找符合模式
START UP*
的一系列行(按row_num
排序)。START
元素应该出现一次,其后可以出现零次或多次UP
元素。START
和UP
是用来表示序列中的行的任意名称。START
被隐式地定义,它匹配任何行,而UP
被定义为其offset
列的值大于序列中前一行的offset
列的值(DEFINE UP AS offset > PREV(offset)
)。 - 计算一个名为
partition
的输出列,其值是与每个匹配项关联的合成数字(MEASURES MATCH_NUMBER() AS partition
)。 - 输出所有带有
partition
列注释的输入行(ALL ROWS PER MATCH
)。
该查询会产生以下结果:
row_num | offset | partition
---------+--------+-----------
1 | 0 | 1
2 | 10 | 1
3 | 22 | 1
4 | 32 | 1
5 | 0 | 2
6 | 0 | 3
7 | 12 | 3
8 | 32 | 3
9 | 44 | 3
(9 rows)
详细信息,请参阅:
- https://trino.io/docs/current/sql/match-recognize.html
- https://trino.io/blog/2021/05/19/row_pattern_matching.html
请注意,此功能仅在 Trino 和基于 Trino 的 Athena 版本 3 中可用。
英文:
You can use MATCH_RECOGNIZE
to identify rows that satisfy a pattern. In your case, you can define a pattern of increasing values of the column offset
:
WITH t(row_num, offset) AS (
VALUES
(1, 0),
(2, 10),
(3, 22),
(4, 32),
(5, 0),
(6, 0),
(7, 12),
(8, 32),
(9, 44)
)
SELECT row_num, offset, partition FROM t
MATCH_RECOGNIZE (
ORDER BY row_num
MEASURES MATCH_NUMBER() AS partition
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (START UP*)
DEFINE UP AS offset > PREV(offset)
)
What the MATCH_RECOGNIZE
clause says in the query above is:
- Find a sequence of rows (ordered by
row_num
) that matches the patternSTART UP*
. TheSTART
element should appear once and theUP
element should appear zero or more times following it.START
andUP
are arbitrary names to represent rows in the sequence.START
is defined implicitly and it matches any row, whileUP
is defined as a row whoseoffset
column has a value larger than theoffset
column of the previous row in the sequence (DEFINE UP AS offset > PREV(offset)
) - Compute an output column named
partition
whose value is a synthetic number associated with each match. (MEASURES MATCH_NUMBER() AS partition
) - Output all the input rows with the
partition
column annotation. (ALL ROWS PER MATCH
)
The query produces:
row_num | offset | partition
---------+--------+-----------
1 | 0 | 1
2 | 10 | 1
3 | 22 | 1
4 | 32 | 1
5 | 0 | 2
6 | 0 | 3
7 | 12 | 3
8 | 32 | 3
9 | 44 | 3
(9 rows)
For more details, please see:
- https://trino.io/docs/current/sql/match-recognize.html
- https://trino.io/blog/2021/05/19/row_pattern_matching.html
Note that this functionality is only available in Trino and Athena version 3, which is based on Trino.
答案2
得分: 1
你可以使用一个运行总和函数,在row_num递增时,每当发现偏移值为零时,总和加1:
select row_num, offset,
sum(case when offset = 0 then 1 else 0 end) over (order by row_num) prt
from table_name
此查询的输出结果为:
row_num offset prt
1 0 1
2 10 1
3 22 1
4 32 1
5 0 2
6 0 3
7 12 3
8 32 3
9 44 3
英文:
You could use a running sum function that increased by 1 whenever a zero offset value is found over the increasing of row_num:
select row_num, offset,
sum(case when offset = 0 then 1 else 0 end) over (order by row_num) prt
from table_name
The output of this query:
row_num offset prt
1 0 1
2 10 1
3 22 1
4 32 1
5 0 2
6 0 3
7 12 3
8 32 3
9 44 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论