2023年1月6日 13:44:39go评论95阅读模式

英文:

match recognize collect row data into single column

问题

The code you provided is in SQL and you mentioned an issue with the price as all_price part in the measures clause. To collect all prices in the pattern and return them as an array into a single column, you can use the collect function. Here's the modified code:

create or replace temporary table stock_price_history (company text, price_date date, price int);
insert into stock_price_history values
    ('ABCD', '2020-10-01', 50),
    ('ABCD', '2020-10-02', 50),
    -- (insert the rest of your data here)
select * from stock_price_history
  match_recognize(
    partition by company
    order by price_date
    measures
      match_number() as match_number,
      collect(price) as all_prices, -- Use collect() to gather all prices into an array
      first(price_date) as start_date,
      last(price_date) as end_date,
      count(*) as rows_in_sequence,
      count(row_with_price_stationary.*) as num_stationary,
      count(row_with_price_increase.*) as num_increases
    one row per match
    after match skip to last row_with_price_increase
    pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
    define
      row_with_price_increase as price > lag(price),
      row_with_price_stationary as price = lag(price)
  )
order by company, match_number;

By using collect(price) as all_prices, you should be able to gather all prices in the pattern into a single column as an array.

英文:

I'm following the tutorial for match_recognize found here:

create or replace temporary table stock_price_history (company text, price_date date, price int);
insert into stock_price_history values
    (&#39;ABCD&#39;, &#39;2020-10-01&#39;, 50),
    (&#39;ABCD&#39;, &#39;2020-10-02&#39;, 50),
    (&#39;ABCD&#39;, &#39;2020-10-03&#39;, 51),
    (&#39;ABCD&#39;, &#39;2020-10-04&#39;, 51),
    (&#39;ABCD&#39;, &#39;2020-10-05&#39;, 51),
    (&#39;ABCD&#39;, &#39;2020-10-06&#39;, 52),
    (&#39;ABCD&#39;, &#39;2020-10-07&#39;, 71),
    (&#39;ABCD&#39;, &#39;2020-10-08&#39;, 80),
    (&#39;ABCD&#39;, &#39;2020-10-09&#39;, 90),
    (&#39;ABCD&#39;, &#39;2020-10-10&#39;, 63),
    (&#39;XYZ&#39; , &#39;2020-10-01&#39;, 24),
    (&#39;XYZ&#39; , &#39;2020-10-02&#39;, 24),
    (&#39;XYZ&#39; , &#39;2020-10-03&#39;, 37),
    (&#39;XYZ&#39; , &#39;2020-10-04&#39;, 63),
    (&#39;XYZ&#39; , &#39;2020-10-05&#39;, 65),
    (&#39;XYZ&#39; , &#39;2020-10-06&#39;, 66),
    (&#39;XYZ&#39; , &#39;2020-10-07&#39;, 50),
    (&#39;XYZ&#39; , &#39;2020-10-08&#39;, 54),
    (&#39;XYZ&#39; , &#39;2020-10-09&#39;, 30),
    (&#39;XYZ&#39; , &#39;2020-10-10&#39;, 32);
    
select * from stock_price_history
  match_recognize(
    partition by company
    order by price_date
    measures
      match_number() as match_number,
      price as all_price,
      first(price_date) as start_date,
      last(price_date) as end_date,
      count(*) as rows_in_sequence,
      count(row_with_price_stationary.*) as num_stationary,
      count(row_with_price_increase.*) as num_increases
    one row per match
    after match skip to last row_with_price_increase
    pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
    define
      row_with_price_increase as price &gt; lag(price),
      row_with_price_stationary as price = lag(price)
  )
order by company, match_number;

The code above is my version of the tutorial code. Everything works fine except the price as all_price part in the measures clause. What I want to do is collect all prices in the pattern and return it as an array into a single column. I know I can do all rows per match to get all rows but that's not what I want.

How would I go about doing that?

答案1

得分: 1

你需要指定all rows per match，否则将失去匹配_recognize函数的信息。您可以在组内使用array_agg来获取单个数组中的价格。由于这会将行计数聚合到下面，您可能还想对每个价格的日期执行相同的操作 - 类似这样：

select   COMPANY
        ,array_agg(PRICE) within group (order by PRICE_DATE) as ALL_PRICE
        ,array_agg(PRICE_DATE) within group (order by PRICE_DATE) as ALL_PRICE_DATE
from stock_price_history
  match_recognize(
    partition by company
    order by price_date
    measures
      match_number() as match_number,
      price as all_price,
      first(price_date) as start_date,
      last(price_date) as end_date,
      count(*) as rows_in_sequence,
      count(row_with_price_stationary.*) as num_stationary,
      count(row_with_price_increase.*) as num_increases
    all rows per match
    after match skip to last row_with_price_increase
    pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
    define
      row_with_price_increase as price > lag(price),
      row_with_price_stationary as price = lag(price)
  )
group by company
order by company
;

如果您想保留所有行，您可以使用window函数版本的array_agg：

select   * exclude ALL_PRICE
        ,array_agg(PRICE) within group (order by PRICE_DATE) 
            over (partition by COMPANY) as ALL_PRICE
from stock_price_history
  match_recognize(
    partition by company
    order by price_date
    measures
      match_number() as match_number,
      price as all_price,
      first(price_date) as start_date,
      last(price_date) as end_date,
      count(*) as rows_in_sequence,
      count(row_with_price_stationary.*) as num_stationary,
      count(row_with_price_increase.*) as num_increases
    all rows per match
    after match skip to last row_with_price_increase
    pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
    define
      row_with_price_increase as price > lag(price),
      row_with_price_stationary as price = lag(price)
  )
order by company
;

英文:

You have to specify all rows per match or lose that information out of the match_recognize function. You can use array_agg within group to get the prices in a single array. Since this aggregates row counts down you may want to do the same for the dates of each of these prices - something like this:

select   COMPANY
,array_agg(PRICE) within group (order by PRICE_DATE) as ALL_PRICE
,array_agg(PRICE_DATE) within group (order by PRICE_DATE) as ALL_PRICE_DATE
from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
price as all_price,
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
all rows per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price &gt; lag(price),
row_with_price_stationary as price = lag(price)
)
group by company
order by company
;

COMPANY	ALL_PRICE	ALL_PRICE_DATE
ABCD	[ 50, 51, 51, 51, 52, 52, 71, 80 ]	[ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05", "2020-10-06", "2020-10-06", "2020-10-07", "2020-10-08" ]
XYZ	[ 24, 37, 63, 63, 65, 66 ]	[ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-04", "2020-10-05", "2020-10-06" ]

If you want to keep all rows, you can use the window function version of array_agg:

select   * exclude ALL_PRICE
,array_agg(PRICE) within group (order by PRICE_DATE) 
over (partition by COMPANY) as ALL_PRICE
from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
price as all_price,
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
all rows per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price &gt; lag(price),
row_with_price_stationary as price = lag(price)
)
order by company
;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

匹配识别将行数据收集到单列中

问题

答案1

大表分页的最佳实践

“Duplicating” entries SQL

C# MS Access OleDb无法进行写操作。

雪花正则表达式 – 提取定界符之间的字符串

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。