使用LAG函数获取前一个非相同数值,同时不考虑特定列。

huangapple go评论62阅读模式
英文:

Using LAG to get the previous non-identical value while not considering certain columns

问题

这可能是问题的扩展:使用LAG函数获取不一定在前一行的值

我有一个名为 my_table 的表,我想要创建一个新的列 previous_order_date,其基于 episode_id、file_id 和 order_date。还有另一列 offer_id,我不想考虑在 previous_order_date 计算中。

episode_id file_id offer_id order_date
1234 3000 700 10-OCT-20
1234 3000 800 10-OCT-20
1234 3000 900 10-OCT-20
1234 3000 800 11-OCT-20
1234 3000 900 11-OCT-20
1234 4000 700 10-OCT-20
1234 4000 800 10-OCT-20
1234 4000 700 11-OCT-20
4321 3000 700 10-OCT-20
4321 3000 800 10-OCT-20
4321 3000 900 10-OCT-20
4321 3000 800 11-OCT-20
4321 3000 900 11-OCT-20
4321 4000 700 10-OCT-20
4321 4000 800 10-OCT-20
4321 5000 700 10-OCT-20
4321 5000 700 11-OCT-20
4321 4000 700 11-OCT-20
4321 3000 700 12-OCT-20

查询:

create table my_table
(episode_id number,
  file_id number,
  offer_id number,
  order_date date
);

insert all
 into my_table values (1234,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (1234,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (1234,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,5000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,5000,700, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,3000,700, to_date('2020/10/12','YYYY/MM/DD'))

我想使用 LAG 来获取基于 episode_idfile_id 分区的上一个 order_date,即我不希望在不同的 offer_id 下考虑前一行的 order_date

我想要输出如下:

episode_id file_id offer_id order_date previous_order_date
1234 3000 700 10-OCT-20 null
1234 3000 800 10-OCT-20 null
1234 3000 900 10-OCT-20 null
1234 3000 800 11-OCT-20 10-OCT-20
1234 3000 900 11-OCT-20 10-OCT-20
1234 4000 700 10-OCT-20 null
1234 4000 800 10-OCT-20 null
1234 4000 700 11-OCT-20 10-OCT-20
4321 3000 700 10-OCT-20 null
4321 3000 800 10-OCT-20 null
4321 3000 900 10-OCT-20 null
4321 3000 800 11-OCT-20 10-OCT-20
4321 3000 900 11-OCT-20 10-OCT-20
4321 4000 700 10-OCT-20 null
4321 4000 800 10-OCT-20 null
4321 5000 700 10-OCT-20 null
4321 5000 700 11-OCT-20
英文:

This may be an extension to the question
Using LAG function to obtain value that is not necessarily in the previous row

I have a table my_table that I would like to create a new column previous_order_date is based on the episode_id, file_id, and order_date. There is another column offer_id that I don't want to consider for the previous_order_date calculation

episode_id file_id offer_id order_date
1234 3000 700 10-OCT-20
1234 3000 800 10-OCT-20
1234 3000 900 10-OCT-20
1234 3000 800 11-OCT-20
1234 3000 900 11-OCT-20
1234 4000 700 10-OCT-20
1234 4000 800 10-OCT-20
1234 4000 700 11-OCT-20
4321 3000 700 10-OCT-20
4321 3000 800 10-OCT-20
4321 3000 900 10-OCT-20
4321 3000 800 11-OCT-20
4321 3000 900 11-OCT-20
4321 4000 700 10-OCT-20
4321 4000 800 10-OCT-20
4321 5000 700 10-OCT-20
4321 5000 700 11-OCT-20
4321 4000 700 11-OCT-20
4321 3000 700 12-OCT-20

Query:

create table my_table
(episode_id number,
  file_id number,
  offer_id number,
  order_date date
);

insert all
 into my_table values (1234,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (1234,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (1234,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (1234,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,5000,700, to_date('2020/10/10','YYYY/MM/DD'))
 into my_table values (4321,5000,700, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
 into my_table values (4321,3000,700, to_date('2020/10/12','YYYY/MM/DD'))

I want to use LAG to get the previous order_date based on partitioned by episode_id and file_id, that is I don't want to get the previous row order_date if it's the same with a different offer_id.

I'm trying to get the output to be like this:

episode_id file_id offer_id order_date previous_order_date
1234 3000 700 10-OCT-20 null
1234 3000 800 10-OCT-20 null
1234 3000 900 10-OCT-20 null
1234 3000 800 11-OCT-20 10-OCT-20
1234 3000 900 11-OCT-20 10-OCT-20
1234 4000 700 10-OCT-20 null
1234 4000 800 10-OCT-20 null
1234 4000 700 11-OCT-20 10-OCT-20
4321 3000 700 10-OCT-20 null
4321 3000 800 10-OCT-20 null
4321 3000 900 10-OCT-20 null
4321 3000 800 11-OCT-20 10-OCT-20
4321 3000 900 11-OCT-20 10-OCT-20
4321 4000 700 10-OCT-20 null
4321 4000 800 10-OCT-20 null
4321 5000 700 10-OCT-20 null
4321 5000 700 11-OCT-20 10-OCT-20
4321 4000 700 11-OCT-20 10-OCT-20
4321 3000 700 12-OCT-20 11-OCT-20

but I've tried running this query

select episode_id
       ,file_id
       ,offer_id
      ,order_date
      ,lag(order_date) over(partition by episode_id, file_id order by order_date) as previous_order_date
from my_table

however that produces this table

episode_id file_id offer_id order_date previous_order_date
1234 3000 700 10-OCT-20 null
1234 3000 800 10-OCT-20 10-OCT-20
1234 3000 900 10-OCT-20 10-OCT-20
1234 3000 800 11-OCT-20 10-OCT-20
1234 3000 900 11-OCT-20 11-OCT-20
1234 4000 700 10-OCT-20 null
1234 4000 800 10-OCT-20 10-OCT-20
1234 4000 700 11-OCT-20 10-OCT-20
4321 3000 700 10-OCT-20 null
4321 3000 800 10-OCT-20 10-OCT-20
4321 3000 900 10-OCT-20 10-OCT-20
4321 3000 800 11-OCT-20 10-OCT-20
4321 3000 900 11-OCT-20 10-OCT-20
4321 4000 700 10-OCT-20 null
4321 4000 800 10-OCT-20 null
4321 5000 700 10-OCT-20 null
4321 5000 700 11-OCT-20 10-OCT-20
4321 4000 700 11-OCT-20 10-OCT-20
4321 3000 700 12-OCT-20 11-OCT-20

which takes the previous row order_date regardless of offer_id. It is quite a big query to get to this point, and I need the offer_id column for further calculations since I'm joining other tables based on that column. So I was wondering since the real-world application has order_date down to the minute if there was a way to have LAG look at previous rows that were at least 1 minute older, similar to the INTERVAL function.

I have tried selecting the episode_id, file_id, and order_date in a CTE and using LAG there but this duplicated the rows, one row with the order_date as the previous_order_date, and one row with a null previous_order_date. The same occurred when I tried to use DENSE_RANK for some reason.

My DB version is 19c

答案1

得分: 2

你似乎想要在LAG中使用PARTITION BY episode_id, file_id, offer_id

select episode_id,
       file_id,
       offer_id,
       order_date,
       LAG(order_date) OVER(
         PARTITION BY episode_id, file_id, offer_id
         ORDER BY order_date
       ) AS previous_order_date
FROM   my_table

或者,也许你可以使用MATCH_RECOGNIZE

select *
FROM   my_table
MATCH_RECOGNIZE(
  PARTITION BY episode_id, file_id
  ORDER BY order_date DESC, offer_id DESC
  MEASURES
    current_row.offer_id AS offer_id,
    current_row.order_date AS order_date,
    prev_day.order_date AS previous_order_date
  AFTER MATCH SKIP TO NEXT ROW
  PATTERN ( current_row {- same_day* -} prev_day? )
  DEFINE
    same_day AS TRUNC(current_row.order_date) = TRUNC(same_day.order_date)
)
ORDER BY episode_id, file_id, order_date, offer_id

对于你的示例数据,输出如下:

EPISODE_ID FILE_ID OFFER_ID ORDER_DATE PREVIOUS_ORDER_DATE
1234 3000 700 2020-10-10 00:00:00 null
1234 3000 800 2020-10-10 00:00:00 null
1234 3000 900 2020-10-10 00:00:00 null
1234 3000 800 2020-10-11 00:00:00 2020-10-10 00:00:00
1234 3000 900 2020-10-11 00:00:00 2020-10-10 00:00:00
1234 4000 700 2020-10-10 00:00:00 null
1234 4000 800 2020-10-10 00:00:00 null
1234 4000 700 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 3000 700 2020-10-10 00:00:00 null
4321 3000 800 2020-10-10 00:00:00 null
4321 3000 900 2020-10-10 00:00:00 null
4321 3000 800 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 3000 900 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 4000 700 2020-10-10 00:00:00 null
4321 4000 800 2020-10-10 00:00:00 null
4321 5000 700 2020-10-10 00:00:00 null
4321 5000 700 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 4000 700 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 3000 700 2020-10-12 00:00:00 2020-10-10 00:00:00

fiddle

英文:

You appear to want to PARTITION BY episode_id, file_id, offer_id within the LAG:

select episode_id,
       file_id,
       offer_id,
       order_date,
       LAG(order_date) OVER(
         PARTITION BY episode_id, file_id, offer_id
         ORDER BY order_date
       ) AS previous_order_date
FROM   my_table

Or, maybe, you can use MATCH_RECOGNIZE:

select *
FROM   my_table
MATCH_RECOGNIZE(
  PARTITION BY episode_id, file_id
  ORDER BY order_date DESC, offer_id DESC
  MEASURES
    current_row.offer_id AS offer_id,
    current_row.order_date AS order_date,
    prev_day.order_date AS previous_order_date
  AFTER MATCH SKIP TO NEXT ROW
  PATTERN ( current_row {- same_day* -} prev_day? )
  DEFINE
    same_day AS TRUNC(current_row.order_date) = TRUNC(same_day.order_date)
)
ORDER BY episode_id, file_id, order_date, offer_id

Which, for your sample data, output:

EPISODE_ID FILE_ID OFFER_ID ORDER_DATE PREVIOUS_ORDER_DATE
1234 3000 700 2020-10-10 00:00:00 null
1234 3000 800 2020-10-10 00:00:00 null
1234 3000 900 2020-10-10 00:00:00 null
1234 3000 800 2020-10-11 00:00:00 2020-10-10 00:00:00
1234 3000 900 2020-10-11 00:00:00 2020-10-10 00:00:00
1234 4000 700 2020-10-10 00:00:00 null
1234 4000 800 2020-10-10 00:00:00 null
1234 4000 700 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 3000 700 2020-10-10 00:00:00 null
4321 3000 800 2020-10-10 00:00:00 null
4321 3000 900 2020-10-10 00:00:00 null
4321 3000 800 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 3000 900 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 4000 700 2020-10-10 00:00:00 null
4321 4000 800 2020-10-10 00:00:00 null
4321 5000 700 2020-10-10 00:00:00 null
4321 5000 700 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 4000 700 2020-10-11 00:00:00 2020-10-10 00:00:00
4321 3000 700 2020-10-12 00:00:00 2020-10-10 00:00:00

fiddle

huangapple
  • 本文由 发表于 2023年6月9日 05:28:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76435806.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定