英文:
Using LAG to get the previous non-identical value while not considering certain columns
问题
这可能是问题的扩展:使用LAG函数获取不一定在前一行的值
我有一个名为 my_table
的表,我想要创建一个新的列 previous_order_date
,其基于 episode_id、file_id 和 order_date。还有另一列 offer_id
,我不想考虑在 previous_order_date
计算中。
episode_id | file_id | offer_id | order_date |
---|---|---|---|
1234 | 3000 | 700 | 10-OCT-20 |
1234 | 3000 | 800 | 10-OCT-20 |
1234 | 3000 | 900 | 10-OCT-20 |
1234 | 3000 | 800 | 11-OCT-20 |
1234 | 3000 | 900 | 11-OCT-20 |
1234 | 4000 | 700 | 10-OCT-20 |
1234 | 4000 | 800 | 10-OCT-20 |
1234 | 4000 | 700 | 11-OCT-20 |
4321 | 3000 | 700 | 10-OCT-20 |
4321 | 3000 | 800 | 10-OCT-20 |
4321 | 3000 | 900 | 10-OCT-20 |
4321 | 3000 | 800 | 11-OCT-20 |
4321 | 3000 | 900 | 11-OCT-20 |
4321 | 4000 | 700 | 10-OCT-20 |
4321 | 4000 | 800 | 10-OCT-20 |
4321 | 5000 | 700 | 10-OCT-20 |
4321 | 5000 | 700 | 11-OCT-20 |
4321 | 4000 | 700 | 11-OCT-20 |
4321 | 3000 | 700 | 12-OCT-20 |
查询:
create table my_table
(episode_id number,
file_id number,
offer_id number,
order_date date
);
insert all
into my_table values (1234,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (1234,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (1234,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,5000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,5000,700, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,3000,700, to_date('2020/10/12','YYYY/MM/DD'))
我想使用 LAG 来获取基于 episode_id
和 file_id
分区的上一个 order_date
,即我不希望在不同的 offer_id
下考虑前一行的 order_date
。
我想要输出如下:
episode_id | file_id | offer_id | order_date | previous_order_date |
---|---|---|---|---|
1234 | 3000 | 700 | 10-OCT-20 | null |
1234 | 3000 | 800 | 10-OCT-20 | null |
1234 | 3000 | 900 | 10-OCT-20 | null |
1234 | 3000 | 800 | 11-OCT-20 | 10-OCT-20 |
1234 | 3000 | 900 | 11-OCT-20 | 10-OCT-20 |
1234 | 4000 | 700 | 10-OCT-20 | null |
1234 | 4000 | 800 | 10-OCT-20 | null |
1234 | 4000 | 700 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 700 | 10-OCT-20 | null |
4321 | 3000 | 800 | 10-OCT-20 | null |
4321 | 3000 | 900 | 10-OCT-20 | null |
4321 | 3000 | 800 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 900 | 11-OCT-20 | 10-OCT-20 |
4321 | 4000 | 700 | 10-OCT-20 | null |
4321 | 4000 | 800 | 10-OCT-20 | null |
4321 | 5000 | 700 | 10-OCT-20 | null |
4321 | 5000 | 700 | 11-OCT-20 |
英文:
This may be an extension to the question
Using LAG function to obtain value that is not necessarily in the previous row
I have a table my_table
that I would like to create a new column previous_order_date
is based on the episode_id, file_id, and order_date. There is another column offer_id
that I don't want to consider for the previous_order_date
calculation
episode_id | file_id | offer_id | order_date |
---|---|---|---|
1234 | 3000 | 700 | 10-OCT-20 |
1234 | 3000 | 800 | 10-OCT-20 |
1234 | 3000 | 900 | 10-OCT-20 |
1234 | 3000 | 800 | 11-OCT-20 |
1234 | 3000 | 900 | 11-OCT-20 |
1234 | 4000 | 700 | 10-OCT-20 |
1234 | 4000 | 800 | 10-OCT-20 |
1234 | 4000 | 700 | 11-OCT-20 |
4321 | 3000 | 700 | 10-OCT-20 |
4321 | 3000 | 800 | 10-OCT-20 |
4321 | 3000 | 900 | 10-OCT-20 |
4321 | 3000 | 800 | 11-OCT-20 |
4321 | 3000 | 900 | 11-OCT-20 |
4321 | 4000 | 700 | 10-OCT-20 |
4321 | 4000 | 800 | 10-OCT-20 |
4321 | 5000 | 700 | 10-OCT-20 |
4321 | 5000 | 700 | 11-OCT-20 |
4321 | 4000 | 700 | 11-OCT-20 |
4321 | 3000 | 700 | 12-OCT-20 |
Query:
create table my_table
(episode_id number,
file_id number,
offer_id number,
order_date date
);
insert all
into my_table values (1234,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (1234,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (1234,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (1234,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,3000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,3000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,3000,900, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,3000,800, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,3000,900, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,4000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,4000,800, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,5000,700, to_date('2020/10/10','YYYY/MM/DD'))
into my_table values (4321,5000,700, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,4000,700, to_date('2020/10/11','YYYY/MM/DD'))
into my_table values (4321,3000,700, to_date('2020/10/12','YYYY/MM/DD'))
I want to use LAG to get the previous order_date based on partitioned by episode_id and file_id, that is I don't want to get the previous row order_date if it's the same with a different offer_id.
I'm trying to get the output to be like this:
episode_id | file_id | offer_id | order_date | previous_order_date |
---|---|---|---|---|
1234 | 3000 | 700 | 10-OCT-20 | null |
1234 | 3000 | 800 | 10-OCT-20 | null |
1234 | 3000 | 900 | 10-OCT-20 | null |
1234 | 3000 | 800 | 11-OCT-20 | 10-OCT-20 |
1234 | 3000 | 900 | 11-OCT-20 | 10-OCT-20 |
1234 | 4000 | 700 | 10-OCT-20 | null |
1234 | 4000 | 800 | 10-OCT-20 | null |
1234 | 4000 | 700 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 700 | 10-OCT-20 | null |
4321 | 3000 | 800 | 10-OCT-20 | null |
4321 | 3000 | 900 | 10-OCT-20 | null |
4321 | 3000 | 800 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 900 | 11-OCT-20 | 10-OCT-20 |
4321 | 4000 | 700 | 10-OCT-20 | null |
4321 | 4000 | 800 | 10-OCT-20 | null |
4321 | 5000 | 700 | 10-OCT-20 | null |
4321 | 5000 | 700 | 11-OCT-20 | 10-OCT-20 |
4321 | 4000 | 700 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 700 | 12-OCT-20 | 11-OCT-20 |
but I've tried running this query
select episode_id
,file_id
,offer_id
,order_date
,lag(order_date) over(partition by episode_id, file_id order by order_date) as previous_order_date
from my_table
however that produces this table
episode_id | file_id | offer_id | order_date | previous_order_date |
---|---|---|---|---|
1234 | 3000 | 700 | 10-OCT-20 | null |
1234 | 3000 | 800 | 10-OCT-20 | 10-OCT-20 |
1234 | 3000 | 900 | 10-OCT-20 | 10-OCT-20 |
1234 | 3000 | 800 | 11-OCT-20 | 10-OCT-20 |
1234 | 3000 | 900 | 11-OCT-20 | 11-OCT-20 |
1234 | 4000 | 700 | 10-OCT-20 | null |
1234 | 4000 | 800 | 10-OCT-20 | 10-OCT-20 |
1234 | 4000 | 700 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 700 | 10-OCT-20 | null |
4321 | 3000 | 800 | 10-OCT-20 | 10-OCT-20 |
4321 | 3000 | 900 | 10-OCT-20 | 10-OCT-20 |
4321 | 3000 | 800 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 900 | 11-OCT-20 | 10-OCT-20 |
4321 | 4000 | 700 | 10-OCT-20 | null |
4321 | 4000 | 800 | 10-OCT-20 | null |
4321 | 5000 | 700 | 10-OCT-20 | null |
4321 | 5000 | 700 | 11-OCT-20 | 10-OCT-20 |
4321 | 4000 | 700 | 11-OCT-20 | 10-OCT-20 |
4321 | 3000 | 700 | 12-OCT-20 | 11-OCT-20 |
which takes the previous row order_date regardless of offer_id. It is quite a big query to get to this point, and I need the offer_id column for further calculations since I'm joining other tables based on that column. So I was wondering since the real-world application has order_date down to the minute if there was a way to have LAG look at previous rows that were at least 1 minute older, similar to the INTERVAL function.
I have tried selecting the episode_id, file_id, and order_date in a CTE and using LAG there but this duplicated the rows, one row with the order_date as the previous_order_date, and one row with a null previous_order_date. The same occurred when I tried to use DENSE_RANK for some reason.
My DB version is 19c
答案1
得分: 2
你似乎想要在LAG
中使用PARTITION BY episode_id, file_id, offer_id
:
select episode_id,
file_id,
offer_id,
order_date,
LAG(order_date) OVER(
PARTITION BY episode_id, file_id, offer_id
ORDER BY order_date
) AS previous_order_date
FROM my_table
或者,也许你可以使用MATCH_RECOGNIZE
:
select *
FROM my_table
MATCH_RECOGNIZE(
PARTITION BY episode_id, file_id
ORDER BY order_date DESC, offer_id DESC
MEASURES
current_row.offer_id AS offer_id,
current_row.order_date AS order_date,
prev_day.order_date AS previous_order_date
AFTER MATCH SKIP TO NEXT ROW
PATTERN ( current_row {- same_day* -} prev_day? )
DEFINE
same_day AS TRUNC(current_row.order_date) = TRUNC(same_day.order_date)
)
ORDER BY episode_id, file_id, order_date, offer_id
对于你的示例数据,输出如下:
EPISODE_ID | FILE_ID | OFFER_ID | ORDER_DATE | PREVIOUS_ORDER_DATE |
---|---|---|---|---|
1234 | 3000 | 700 | 2020-10-10 00:00:00 | null |
1234 | 3000 | 800 | 2020-10-10 00:00:00 | null |
1234 | 3000 | 900 | 2020-10-10 00:00:00 | null |
1234 | 3000 | 800 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
1234 | 3000 | 900 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
1234 | 4000 | 700 | 2020-10-10 00:00:00 | null |
1234 | 4000 | 800 | 2020-10-10 00:00:00 | null |
1234 | 4000 | 700 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 3000 | 700 | 2020-10-10 00:00:00 | null |
4321 | 3000 | 800 | 2020-10-10 00:00:00 | null |
4321 | 3000 | 900 | 2020-10-10 00:00:00 | null |
4321 | 3000 | 800 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 3000 | 900 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 4000 | 700 | 2020-10-10 00:00:00 | null |
4321 | 4000 | 800 | 2020-10-10 00:00:00 | null |
4321 | 5000 | 700 | 2020-10-10 00:00:00 | null |
4321 | 5000 | 700 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 4000 | 700 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 3000 | 700 | 2020-10-12 00:00:00 | 2020-10-10 00:00:00 |
英文:
You appear to want to PARTITION BY episode_id, file_id, offer_id
within the LAG
:
select episode_id,
file_id,
offer_id,
order_date,
LAG(order_date) OVER(
PARTITION BY episode_id, file_id, offer_id
ORDER BY order_date
) AS previous_order_date
FROM my_table
Or, maybe, you can use MATCH_RECOGNIZE
:
select *
FROM my_table
MATCH_RECOGNIZE(
PARTITION BY episode_id, file_id
ORDER BY order_date DESC, offer_id DESC
MEASURES
current_row.offer_id AS offer_id,
current_row.order_date AS order_date,
prev_day.order_date AS previous_order_date
AFTER MATCH SKIP TO NEXT ROW
PATTERN ( current_row {- same_day* -} prev_day? )
DEFINE
same_day AS TRUNC(current_row.order_date) = TRUNC(same_day.order_date)
)
ORDER BY episode_id, file_id, order_date, offer_id
Which, for your sample data, output:
EPISODE_ID | FILE_ID | OFFER_ID | ORDER_DATE | PREVIOUS_ORDER_DATE |
---|---|---|---|---|
1234 | 3000 | 700 | 2020-10-10 00:00:00 | null |
1234 | 3000 | 800 | 2020-10-10 00:00:00 | null |
1234 | 3000 | 900 | 2020-10-10 00:00:00 | null |
1234 | 3000 | 800 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
1234 | 3000 | 900 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
1234 | 4000 | 700 | 2020-10-10 00:00:00 | null |
1234 | 4000 | 800 | 2020-10-10 00:00:00 | null |
1234 | 4000 | 700 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 3000 | 700 | 2020-10-10 00:00:00 | null |
4321 | 3000 | 800 | 2020-10-10 00:00:00 | null |
4321 | 3000 | 900 | 2020-10-10 00:00:00 | null |
4321 | 3000 | 800 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 3000 | 900 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 4000 | 700 | 2020-10-10 00:00:00 | null |
4321 | 4000 | 800 | 2020-10-10 00:00:00 | null |
4321 | 5000 | 700 | 2020-10-10 00:00:00 | null |
4321 | 5000 | 700 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 4000 | 700 | 2020-10-11 00:00:00 | 2020-10-10 00:00:00 |
4321 | 3000 | 700 | 2020-10-12 00:00:00 | 2020-10-10 00:00:00 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论