英文:
How to correctly get the second row of a partition by
问题
第一种方法中,问题可能出现在仅按照 'tid' 和 'created_date' 列对数据进行排序时。这可能导致在 'tid' 相同的情况下,无法准确地确定哪个记录是最新的。对于 'tid' 相同的记录,它们的 'created_date' 可能会出现混淆。
第二种方法也存在相同的问题,因为它仍然只根据 'tid' 和 'created_date' 进行排序,并且分配了相同的行号。
要解决这个问题,你可以使用窗口函数来找到最新的记录。下面是一个修改后的查询:
WITH RankedData AS (
SELECT
created_date,
tid,
fmid,
ROW_NUMBER() OVER (PARTITION BY tid ORDER BY created_date DESC) AS row_num
FROM
table
)
SELECT
created_date,
tid,
fmid
FROM
RankedData
WHERE
row_num = 1;
这个查询将根据 'tid' 分区,并按 'created_date' 降序排序,然后为每个分区中的记录分配行号。最后,它仅选择行号为 1 的记录,这将是每个 'tid' 分区中的最新记录。
英文:
First of all, I'm using BigQuery. In my table I have some duplicated rows due to a change in one of the columns (called fmid). When this happens, I have to get the latest record, which is defined by the time (called created_date). The only thing these duplicated records have in common is the id (called tid).
I have tried two different options to set which row is the first and which one is the latest, so then I could remove what I want.
First one:
SELECT *, row_number() OVER( PARTITION BY tid) as number
FROM (SELECT created_date,tid, fmid
FROM table
ORDER BY tid,created_date)
ORDER BY tid, created_date
Results:
Sometimes it's correct, but the order of the id 296644550 is not right.
1 2023-02-03 22:11:07 UTC 233330088 4 1
2 2023-02-04 10:07:06 UTC 233330088 18 2
3 2023-02-11 11:06:42 UTC 284447799 4 1
4 2023-02-11 11:22:13 UTC 284447799 18 2
5 2023-02-10 10:39:23 UTC 296644550 4 2
6 2023-02-10 19:32:08 UTC 296644550 18 1
Second one:
SELECT *, row_number() OVER( PARTITION BY tid,created_date) as number
FROM (SELECT created_date,tid, fmid
FROM table
ORDER BY tid,created_date)
ORDER BY tid, created_date
Results:
The column "number" gets the same result.
Row created_date tid fmid number
1 2023-02-03 22:11:07 UTC 233330088 4 1
2 2023-02-04 10:07:06 UTC 233330088 18 1
3 2023-02-11 11:06:42 UTC 284447799 4 1
4 2023-02-11 11:22:13 UTC 284447799 18 1
5 2023-02-10 10:39:23 UTC 296644550 4 1
6 2023-02-10 19:32:08 UTC 296644550 18 1
Any idea of what's wrong?
答案1
得分: 0
排序必须在row_number的窗口内进行,以便获得正确的编号
SELECT created_date, tid, fmid
FROM (
SELECT created_date, tid, fmid,
row_number() OVER (PARTITION BY tid ORDER BY created_date) as number
FROM table
)
WHERE number = 2
ORDER BY tid
或者作为CTE:
WITH CTE AS (
SELECT created_date, tid, fmid,
row_number() OVER (PARTITION BY tid ORDER BY created_date) as number
FROM table
)
SELECT created_date, tid, fmid
FROM CTE
WHERE number = 2
ORDER BY tid
英文:
the sorting has to happen in the window of row_number, so that you get the correct number
SELECT created_date,tid, fmid
FROM (SELECT created_date,tid, fmid
, row_number() OVER( PARTITION BY tid ORDER BY created_date) as number
FROM table
)
WHERE number = 2
ORDER BY tid
or as CTE
WITH CTE AS (
SELECT created_date,tid, fmid
, row_number() OVER( PARTITION BY tid ORDER BY created_date) as number
FROM table )
SELECT created_date,tid, fmid
FROM CTE
WHERE number = 2
ORDER BY tid
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论