如何正确获取分区的第二行

huangapple go评论99阅读模式
英文:

How to correctly get the second row of a partition by

问题

第一种方法中,问题可能出现在仅按照 'tid' 和 'created_date' 列对数据进行排序时。这可能导致在 'tid' 相同的情况下,无法准确地确定哪个记录是最新的。对于 'tid' 相同的记录,它们的 'created_date' 可能会出现混淆。

第二种方法也存在相同的问题,因为它仍然只根据 'tid' 和 'created_date' 进行排序,并且分配了相同的行号。

要解决这个问题,你可以使用窗口函数来找到最新的记录。下面是一个修改后的查询:

  1. WITH RankedData AS (
  2. SELECT
  3. created_date,
  4. tid,
  5. fmid,
  6. ROW_NUMBER() OVER (PARTITION BY tid ORDER BY created_date DESC) AS row_num
  7. FROM
  8. table
  9. )
  10. SELECT
  11. created_date,
  12. tid,
  13. fmid
  14. FROM
  15. RankedData
  16. WHERE
  17. row_num = 1;

这个查询将根据 'tid' 分区,并按 'created_date' 降序排序,然后为每个分区中的记录分配行号。最后,它仅选择行号为 1 的记录,这将是每个 'tid' 分区中的最新记录。

英文:

First of all, I'm using BigQuery. In my table I have some duplicated rows due to a change in one of the columns (called fmid). When this happens, I have to get the latest record, which is defined by the time (called created_date). The only thing these duplicated records have in common is the id (called tid).

I have tried two different options to set which row is the first and which one is the latest, so then I could remove what I want.

First one:

  1. SELECT *, row_number() OVER( PARTITION BY tid) as number
  2. FROM (SELECT created_date,tid, fmid
  3. FROM table
  4. ORDER BY tid,created_date)
  5. ORDER BY tid, created_date

Results:
Sometimes it's correct, but the order of the id 296644550 is not right.

  1. 1 2023-02-03 22:11:07 UTC 233330088 4 1
  2. 2 2023-02-04 10:07:06 UTC 233330088 18 2
  3. 3 2023-02-11 11:06:42 UTC 284447799 4 1
  4. 4 2023-02-11 11:22:13 UTC 284447799 18 2
  5. 5 2023-02-10 10:39:23 UTC 296644550 4 2
  6. 6 2023-02-10 19:32:08 UTC 296644550 18 1

Second one:

  1. SELECT *, row_number() OVER( PARTITION BY tid,created_date) as number
  2. FROM (SELECT created_date,tid, fmid
  3. FROM table
  4. ORDER BY tid,created_date)
  5. ORDER BY tid, created_date

Results:
The column "number" gets the same result.

  1. Row created_date tid fmid number
  2. 1 2023-02-03 22:11:07 UTC 233330088 4 1
  3. 2 2023-02-04 10:07:06 UTC 233330088 18 1
  4. 3 2023-02-11 11:06:42 UTC 284447799 4 1
  5. 4 2023-02-11 11:22:13 UTC 284447799 18 1
  6. 5 2023-02-10 10:39:23 UTC 296644550 4 1
  7. 6 2023-02-10 19:32:08 UTC 296644550 18 1

Any idea of what's wrong?

答案1

得分: 0

排序必须在row_number的窗口内进行,以便获得正确的编号

  1. SELECT created_date, tid, fmid
  2. FROM (
  3. SELECT created_date, tid, fmid,
  4. row_number() OVER (PARTITION BY tid ORDER BY created_date) as number
  5. FROM table
  6. )
  7. WHERE number = 2
  8. ORDER BY tid

或者作为CTE:

  1. WITH CTE AS (
  2. SELECT created_date, tid, fmid,
  3. row_number() OVER (PARTITION BY tid ORDER BY created_date) as number
  4. FROM table
  5. )
  6. SELECT created_date, tid, fmid
  7. FROM CTE
  8. WHERE number = 2
  9. ORDER BY tid
英文:

the sorting has to happen in the window of row_number, so that you get the correct number

  1. SELECT created_date,tid, fmid
  2. FROM (SELECT created_date,tid, fmid
  3. , row_number() OVER( PARTITION BY tid ORDER BY created_date) as number
  4. FROM table
  5. )
  6. WHERE number = 2
  7. ORDER BY tid

or as CTE

  1. WITH CTE AS (
  2. SELECT created_date,tid, fmid
  3. , row_number() OVER( PARTITION BY tid ORDER BY created_date) as number
  4. FROM table )
  5. SELECT created_date,tid, fmid
  6. FROM CTE
  7. WHERE number = 2
  8. ORDER BY tid

huangapple
  • 本文由 发表于 2023年6月26日 16:17:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554799.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定