匹配识别将行数据收集到单列中

huangapple go评论95阅读模式
英文:

match recognize collect row data into single column

问题

The code you provided is in SQL and you mentioned an issue with the price as all_price part in the measures clause. To collect all prices in the pattern and return them as an array into a single column, you can use the collect function. Here's the modified code:

  1. create or replace temporary table stock_price_history (company text, price_date date, price int);
  2. insert into stock_price_history values
  3. ('ABCD', '2020-10-01', 50),
  4. ('ABCD', '2020-10-02', 50),
  5. -- (insert the rest of your data here)
  6. select * from stock_price_history
  7. match_recognize(
  8. partition by company
  9. order by price_date
  10. measures
  11. match_number() as match_number,
  12. collect(price) as all_prices, -- Use collect() to gather all prices into an array
  13. first(price_date) as start_date,
  14. last(price_date) as end_date,
  15. count(*) as rows_in_sequence,
  16. count(row_with_price_stationary.*) as num_stationary,
  17. count(row_with_price_increase.*) as num_increases
  18. one row per match
  19. after match skip to last row_with_price_increase
  20. pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
  21. define
  22. row_with_price_increase as price > lag(price),
  23. row_with_price_stationary as price = lag(price)
  24. )
  25. order by company, match_number;

By using collect(price) as all_prices, you should be able to gather all prices in the pattern into a single column as an array.

英文:

I'm following the tutorial for match_recognize found here:

  1. create or replace temporary table stock_price_history (company text, price_date date, price int);
  2. insert into stock_price_history values
  3. ('ABCD', '2020-10-01', 50),
  4. ('ABCD', '2020-10-02', 50),
  5. ('ABCD', '2020-10-03', 51),
  6. ('ABCD', '2020-10-04', 51),
  7. ('ABCD', '2020-10-05', 51),
  8. ('ABCD', '2020-10-06', 52),
  9. ('ABCD', '2020-10-07', 71),
  10. ('ABCD', '2020-10-08', 80),
  11. ('ABCD', '2020-10-09', 90),
  12. ('ABCD', '2020-10-10', 63),
  13. ('XYZ' , '2020-10-01', 24),
  14. ('XYZ' , '2020-10-02', 24),
  15. ('XYZ' , '2020-10-03', 37),
  16. ('XYZ' , '2020-10-04', 63),
  17. ('XYZ' , '2020-10-05', 65),
  18. ('XYZ' , '2020-10-06', 66),
  19. ('XYZ' , '2020-10-07', 50),
  20. ('XYZ' , '2020-10-08', 54),
  21. ('XYZ' , '2020-10-09', 30),
  22. ('XYZ' , '2020-10-10', 32);
  23. select * from stock_price_history
  24. match_recognize(
  25. partition by company
  26. order by price_date
  27. measures
  28. match_number() as match_number,
  29. price as all_price,
  30. first(price_date) as start_date,
  31. last(price_date) as end_date,
  32. count(*) as rows_in_sequence,
  33. count(row_with_price_stationary.*) as num_stationary,
  34. count(row_with_price_increase.*) as num_increases
  35. one row per match
  36. after match skip to last row_with_price_increase
  37. pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
  38. define
  39. row_with_price_increase as price > lag(price),
  40. row_with_price_stationary as price = lag(price)
  41. )
  42. order by company, match_number;

The code above is my version of the tutorial code. Everything works fine except the price as all_price part in the measures clause. What I want to do is collect all prices in the pattern and return it as an array into a single column. I know I can do all rows per match to get all rows but that's not what I want.

How would I go about doing that?

答案1

得分: 1

你需要指定all rows per match,否则将失去匹配_recognize函数的信息。您可以在组内使用array_agg来获取单个数组中的价格。由于这会将行计数聚合到下面,您可能还想对每个价格的日期执行相同的操作 - 类似这样:

  1. select COMPANY
  2. ,array_agg(PRICE) within group (order by PRICE_DATE) as ALL_PRICE
  3. ,array_agg(PRICE_DATE) within group (order by PRICE_DATE) as ALL_PRICE_DATE
  4. from stock_price_history
  5. match_recognize(
  6. partition by company
  7. order by price_date
  8. measures
  9. match_number() as match_number,
  10. price as all_price,
  11. first(price_date) as start_date,
  12. last(price_date) as end_date,
  13. count(*) as rows_in_sequence,
  14. count(row_with_price_stationary.*) as num_stationary,
  15. count(row_with_price_increase.*) as num_increases
  16. all rows per match
  17. after match skip to last row_with_price_increase
  18. pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
  19. define
  20. row_with_price_increase as price > lag(price),
  21. row_with_price_stationary as price = lag(price)
  22. )
  23. group by company
  24. order by company
  25. ;

如果您想保留所有行,您可以使用window函数版本的array_agg:

  1. select * exclude ALL_PRICE
  2. ,array_agg(PRICE) within group (order by PRICE_DATE)
  3. over (partition by COMPANY) as ALL_PRICE
  4. from stock_price_history
  5. match_recognize(
  6. partition by company
  7. order by price_date
  8. measures
  9. match_number() as match_number,
  10. price as all_price,
  11. first(price_date) as start_date,
  12. last(price_date) as end_date,
  13. count(*) as rows_in_sequence,
  14. count(row_with_price_stationary.*) as num_stationary,
  15. count(row_with_price_increase.*) as num_increases
  16. all rows per match
  17. after match skip to last row_with_price_increase
  18. pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
  19. define
  20. row_with_price_increase as price > lag(price),
  21. row_with_price_stationary as price = lag(price)
  22. )
  23. order by company
  24. ;
英文:

You have to specify all rows per match or lose that information out of the match_recognize function. You can use array_agg within group to get the prices in a single array. Since this aggregates row counts down you may want to do the same for the dates of each of these prices - something like this:

  1. select COMPANY
  2. ,array_agg(PRICE) within group (order by PRICE_DATE) as ALL_PRICE
  3. ,array_agg(PRICE_DATE) within group (order by PRICE_DATE) as ALL_PRICE_DATE
  4. from stock_price_history
  5. match_recognize(
  6. partition by company
  7. order by price_date
  8. measures
  9. match_number() as match_number,
  10. price as all_price,
  11. first(price_date) as start_date,
  12. last(price_date) as end_date,
  13. count(*) as rows_in_sequence,
  14. count(row_with_price_stationary.*) as num_stationary,
  15. count(row_with_price_increase.*) as num_increases
  16. all rows per match
  17. after match skip to last row_with_price_increase
  18. pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
  19. define
  20. row_with_price_increase as price > lag(price),
  21. row_with_price_stationary as price = lag(price)
  22. )
  23. group by company
  24. order by company
  25. ;
COMPANY ALL_PRICE ALL_PRICE_DATE
ABCD [ 50, 51, 51, 51, 52, 52, 71, 80 ] [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05", "2020-10-06", "2020-10-06", "2020-10-07", "2020-10-08" ]
XYZ [ 24, 37, 63, 63, 65, 66 ] [ "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-04", "2020-10-05", "2020-10-06" ]

If you want to keep all rows, you can use the window function version of array_agg:

  1. select * exclude ALL_PRICE
  2. ,array_agg(PRICE) within group (order by PRICE_DATE)
  3. over (partition by COMPANY) as ALL_PRICE
  4. from stock_price_history
  5. match_recognize(
  6. partition by company
  7. order by price_date
  8. measures
  9. match_number() as match_number,
  10. price as all_price,
  11. first(price_date) as start_date,
  12. last(price_date) as end_date,
  13. count(*) as rows_in_sequence,
  14. count(row_with_price_stationary.*) as num_stationary,
  15. count(row_with_price_increase.*) as num_increases
  16. all rows per match
  17. after match skip to last row_with_price_increase
  18. pattern(row_before_increase row_with_price_increase{1} row_with_price_stationary* row_with_price_increase{1})
  19. define
  20. row_with_price_increase as price > lag(price),
  21. row_with_price_stationary as price = lag(price)
  22. )
  23. order by company
  24. ;

huangapple
  • 本文由 发表于 2023年1月6日 13:44:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75027355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定