2023年2月8日 13:18:18go评论108阅读模式

英文:

How to partition a table based on a pattern(in this case an increasing sub sequence) that is found in a column in presto sql

问题

我遇到了以下问题。我有一个表T，在其中有一个名为offset的列，它将始终是一个以零开头的递增整数子序列集合。

表 T：
**行号, offset, col_1, col_2, col_3 ....**
     1       0
     2      10
     3      22
     4       32
     5       0
     6       0
     7       12
     8       32 
     9       44

我想根据offset列对表进行分区，使属于每个严格递增子序列的行放入一个分区。在上面的示例中，行[1,2,3,4]，[0]，[0,12,32,44]是我正在寻找的分区。我该如何继续，或者在SQL中是否可能实现这个？我尝试使用lead和lag分区函数，想法是创建一个辅助列，存储分区号，然后根据这个列对结果表进行分组。但我认为这样的努力可能需要循环功能，而我不认为SQL中存在这样的功能。

英文:

I am encountering the following problem. I have a table T, in which there is column named offset which will always be a collection of increasing sub sequences of integers beginning with zero.

Table T:
**row_num, offset ,col_1, col_2, col_3 ....**
     1       0
     2      10
     3      22
     4       32
     5       0
     6       0
     7       12
     8       32 
     9       44

I want to partition the table based on offset column such that the rows belonging to each strictly increasing subsequence are put into one partition. In the above example rows [1,2,3,4] , [0], [0,12,32,44] are the partitions that i am looking for. How do i proceed or is this even possible in sql?

I tried using lead and lag partition functions, the idea was to come up with an auxiliary column which stores partition numbers and then group by the resulting table based on this column. But i believe such an effort would require the loop functionality which i don't think exists in sql.

答案1

得分: 4

您可以使用 MATCH_RECOGNIZE 来识别满足某种模式的行。在您的情况下，您可以定义一个递增值 offset 的模式：

WITH t(row_num, offset) AS (
    VALUES
             (1, 0),
             (2, 10),
             (3, 22),
             (4, 32),
             (5, 0),
             (6, 0),
             (7, 12),
             (8, 32),
             (9, 44)
)
SELECT row_num, offset, partition FROM t
    MATCH_RECOGNIZE (
        ORDER BY row_num
        MEASURES MATCH_NUMBER() AS partition
        ALL ROWS PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (START UP*)
        DEFINE UP AS offset > PREV(offset)
    )

上面查询中的 MATCH_RECOGNIZE 子句的含义是：

查找符合模式 START UP* 的一系列行（按 row_num 排序）。START 元素应该出现一次，其后可以出现零次或多次 UP 元素。START 和 UP 是用来表示序列中的行的任意名称。START 被隐式地定义，它匹配任何行，而 UP 被定义为其 offset 列的值大于序列中前一行的 offset 列的值（DEFINE UP AS offset > PREV(offset)）。
计算一个名为 partition 的输出列，其值是与每个匹配项关联的合成数字（MEASURES MATCH_NUMBER() AS partition）。
输出所有带有 partition 列注释的输入行（ALL ROWS PER MATCH）。

该查询会产生以下结果：

 row_num | offset | partition
---------+--------+-----------
       1 |      0 |         1
       2 |     10 |         1
       3 |     22 |         1
       4 |     32 |         1
       5 |      0 |         2
       6 |      0 |         3
       7 |     12 |         3
       8 |     32 |         3
       9 |     44 |         3
(9 rows)

详细信息，请参阅：

请注意，此功能仅在 Trino 和基于 Trino 的 Athena 版本 3 中可用。

英文:

You can use MATCH_RECOGNIZE to identify rows that satisfy a pattern. In your case, you can define a pattern of increasing values of the column offset:

WITH t(row_num, offset) AS (
    VALUES
             (1, 0),
             (2, 10),
             (3, 22),
             (4, 32),
             (5, 0),
             (6, 0),
             (7, 12),
             (8, 32),
             (9, 44)
)
SELECT row_num, offset, partition FROM t
    MATCH_RECOGNIZE (
        ORDER BY row_num
        MEASURES MATCH_NUMBER() AS partition
        ALL ROWS PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (START UP*)
        DEFINE UP AS offset &gt; PREV(offset)
    )

What the MATCH_RECOGNIZE clause says in the query above is:

Find a sequence of rows (ordered by row_num) that matches the pattern START UP*. The START element should appear once and the UP element should appear zero or more times following it. START and UP are arbitrary names to represent rows in the sequence. START is defined implicitly and it matches any row, while UP is defined as a row whose offset column has a value larger than the offset column of the previous row in the sequence (DEFINE UP AS offset > PREV(offset))
Compute an output column named partition whose value is a synthetic number associated with each match. (MEASURES MATCH_NUMBER() AS partition)
Output all the input rows with the partition column annotation. (ALL ROWS PER MATCH)

The query produces:

 row_num | offset | partition
---------+--------+-----------
       1 |      0 |         1
       2 |     10 |         1
       3 |     22 |         1
       4 |     32 |         1
       5 |      0 |         2
       6 |      0 |         3
       7 |     12 |         3
       8 |     32 |         3
       9 |     44 |         3
(9 rows)

For more details, please see:

Note that this functionality is only available in Trino and Athena version 3, which is based on Trino.

答案2

得分: 1

你可以使用一个运行总和函数，在row_num递增时，每当发现偏移值为零时，总和加1：

select row_num, offset,
  sum(case when offset = 0 then 1 else 0 end) over (order by row_num) prt
from table_name

此查询的输出结果为：

row_num   offset   prt
1         0        1
2         10       1
3         22       1
4         32       1
5         0        2
6         0        3
7         12       3
8         32       3
9         44       3

英文:

You could use a running sum function that increased by 1 whenever a zero offset value is found over the increasing of row_num:

select row_num, offset,
  sum(case when offset = 0 then 1 else 0 end) over (order by row_num) prt
from table_name

The output of this query:

row_num	  offset	prt
1	      0	        1
2	      10	    1
3	      22	    1
4	      32	    1
5	      0	        2
6	      0	        3
7	      12    	3
8	      32	    3
9	      44	    3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to partition a table based on a pattern(in this case an increasing sub sequence) that is found in a column in presto sql

问题

答案1

答案2

List在EntityFramework Entity中如何工作？

SAS Studio – 从宏变量列表计数并筛选多个列

根据相同的ID值更改条件，更新其他行的子字符串值。

mySQL查询以返回给定时期的最短状态持续时间

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。