英文:
match recognize collect row data into single column
问题
The code you provided is in SQL and you mentioned an issue with the price as all_price
part in the measures
clause. To collect all prices in the pattern and return them as an array into a single column, you can use the collect
function. Here's the modified code:
create or replace temporary table stock_price_history (company text, price_date date, price int);
insert into stock_price_history values
('ABCD', '2020-10-01', 50),
('ABCD', '2020-10-02', 50),
-- (insert the rest of your data here)
select * from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
collect(price) as all_prices, -- Use collect() to gather all prices into an array
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
one row per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price > lag(price),
row_with_price_stationary as price = lag(price)
)
order by company, match_number;
By using collect(price) as all_prices
, you should be able to gather all prices in the pattern into a single column as an array.
英文:
I'm following the tutorial for match_recognize
found here:
create or replace temporary table stock_price_history (company text, price_date date, price int);
insert into stock_price_history values
('ABCD', '2020-10-01', 50),
('ABCD', '2020-10-02', 50),
('ABCD', '2020-10-03', 51),
('ABCD', '2020-10-04', 51),
('ABCD', '2020-10-05', 51),
('ABCD', '2020-10-06', 52),
('ABCD', '2020-10-07', 71),
('ABCD', '2020-10-08', 80),
('ABCD', '2020-10-09', 90),
('ABCD', '2020-10-10', 63),
('XYZ' , '2020-10-01', 24),
('XYZ' , '2020-10-02', 24),
('XYZ' , '2020-10-03', 37),
('XYZ' , '2020-10-04', 63),
('XYZ' , '2020-10-05', 65),
('XYZ' , '2020-10-06', 66),
('XYZ' , '2020-10-07', 50),
('XYZ' , '2020-10-08', 54),
('XYZ' , '2020-10-09', 30),
('XYZ' , '2020-10-10', 32);
select * from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
price as all_price,
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
one row per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price > lag(price),
row_with_price_stationary as price = lag(price)
)
order by company, match_number;
The code above is my version of the tutorial code. Everything works fine except the price as all_price
part in the measures
clause. What I want to do is collect all prices in the pattern and return it as an array into a single column. I know I can do all rows per match
to get all rows but that's not what I want.
How would I go about doing that?
答案1
得分: 1
你需要指定all rows per match
,否则将失去匹配_recognize函数的信息。您可以在组内使用array_agg来获取单个数组中的价格。由于这会将行计数聚合到下面,您可能还想对每个价格的日期执行相同的操作 - 类似这样:
select COMPANY
,array_agg(PRICE) within group (order by PRICE_DATE) as ALL_PRICE
,array_agg(PRICE_DATE) within group (order by PRICE_DATE) as ALL_PRICE_DATE
from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
price as all_price,
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
all rows per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price > lag(price),
row_with_price_stationary as price = lag(price)
)
group by company
order by company
;
如果您想保留所有行,您可以使用window函数版本的array_agg:
select * exclude ALL_PRICE
,array_agg(PRICE) within group (order by PRICE_DATE)
over (partition by COMPANY) as ALL_PRICE
from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
price as all_price,
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
all rows per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price > lag(price),
row_with_price_stationary as price = lag(price)
)
order by company
;
英文:
You have to specify all rows per match
or lose that information out of the match_recognize function. You can use array_agg within group to get the prices in a single array. Since this aggregates row counts down you may want to do the same for the dates of each of these prices - something like this:
select COMPANY
,array_agg(PRICE) within group (order by PRICE_DATE) as ALL_PRICE
,array_agg(PRICE_DATE) within group (order by PRICE_DATE) as ALL_PRICE_DATE
from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
price as all_price,
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
all rows per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price > lag(price),
row_with_price_stationary as price = lag(price)
)
group by company
order by company
;
COMPANY | ALL_PRICE | ALL_PRICE_DATE |
---|---|---|
ABCD | [ 50, 51, 51, 51, 52, 52, 71, 80 ] | [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05", "2020-10-06", "2020-10-06", "2020-10-07", "2020-10-08" ] |
XYZ | [ 24, 37, 63, 63, 65, 66 ] | [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-04", "2020-10-05", "2020-10-06" ] |
If you want to keep all rows, you can use the window function version of array_agg:
select * exclude ALL_PRICE
,array_agg(PRICE) within group (order by PRICE_DATE)
over (partition by COMPANY) as ALL_PRICE
from stock_price_history
match_recognize(
partition by company
order by price_date
measures
match_number() as match_number,
price as all_price,
first(price_date) as start_date,
last(price_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_price_stationary.*) as num_stationary,
count(row_with_price_increase.*) as num_increases
all rows per match
after match skip to last row_with_price_increase
pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
define
row_with_price_increase as price > lag(price),
row_with_price_stationary as price = lag(price)
)
order by company
;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论