Postgres查询,忽略每个组的第一个2个条目。

huangapple go评论55阅读模式
英文:

Postgres query that ignores the first 2 entries of day per group

问题

我想制作一个忽略每队列每天的前2个条目的PostgreSQL查询,其中“一天”被定义为从18:00开始。

例如,如果我的数据如下:

id queue timestamp (bigint)
1 q1 1673721000000000000 (2023-01-14 18:30:00) - q1的当天第一个条目
2 q1 1673728200000000000 (2023-01-14 20:30:00)
3 q1 1673760600000000000 (2023-01-15 05:30:00)
4 q1 1673806000000000000 (2023-01-15 18:10:00) - q1的当天第一个条目
5 q2 1673721000000000000 (2023-01-14 18:30:00) - q2的当天第一个条目
6 q2 1673728200000000000 (2023-01-14 20:30:00)
7 q2 1673760600000000000 (2023-01-15 05:30:00)
8 q2 1673802600000000000 (2023-01-15 17:10:00)
9 q2 1674067800000000000 (2023-01-18 18:50:00) - q2的当天第一个条目
10 q2 1674075000000000000 (2023-01-18 20:50:00)
11 q2 1674096600000000000 (2023-01-19 02:50:00)
12 q2 1674132600000000000 (2023-01-19 12:50:00)
13 q3 1673721000000000000 (2023-01-14 18:30:00) - q3的当天第一个条目
14 q3 1673728200000000000 (2023-01-14 20:30:00)

查询结果应该是id为3、7、8、11和12的条目:

id queue timestamp
3 q1 1673760600000000000
7 q2 1673760600000000000
8 q2 1673802600000000000
11 q2 1674096600000000000
12 q2 1674132600000000000

我尝试过使用在分区上的row_number函数,但在筛选方面遇到了问题。

英文:

I want to make a postgres select query that ignores the first 2 entries of day per queue, where a "day" is defined as starting at 18:00

For example, If my data is like:

id queue timestamp (bigint)
1 q1 1673721000000000000 (2023-01-14 18:30:00) - first entry of day for q1
2 q1 1673728200000000000 (2023-01-14 20:30:00)
3 q1 1673760600000000000 (2023-01-15 05:30:00)
4 q1 1673806000000000000 (2023-01-15 18:10:00) - first entry of day for q1
5 q2 1673721000000000000 (2023-01-14 18:30:00) - first entry of day for q2
6 q2 1673728200000000000 (2023-01-14 20:30:00)
7 q2 1673760600000000000 (2023-01-15 05:30:00)
8 q2 1673802600000000000 (2023-01-15 17:10:00)
9 q2 1674067800000000000 (2023-01-18 18:50:00) - first entry of day for q2
10 q2 1674075000000000000 (2023-01-18 20:50:00)
11 q2 1674096600000000000 (2023-01-19 02:50:00)
12 q2 1674132600000000000 (2023-01-19 12:50:00)
13 q3 1673721000000000000 (2023-01-14 18:30:00) - first entry of day for q3
14 q3 1673728200000000000 (2023-01-14 20:30:00)

The select results should be ids 3 7 8 11 and 12:

id queue timestamp
3 q1 1673760600000000000
7 q2 1673760600000000000
8 q2 1673802600000000000
11 q2 1674096600000000000
12 q2 1674132600000000000

I've tried using select row_number over a partition, but having issues getting the filtering correct.

// of course this doesn't work, since it only ignores first 2 of all time, not per day starting at 18:00

SELECT *
FROM (
  SELECT row_number() over (PARTITION by queue ORDER by timestamp) as row_n,
                                                                      *
  FROM mytable
) results
WHERE results.row_n > 2

// gives hourly time of timestamp

CAST((to_timestamp("timestamp"/1000000000)) AS time)

Any help is appreciated. Thanks!

答案1

得分: 1

如果ts是你的timestamp,你应该在以下表达式上进行partition

partition by queue, (ts - interval'18' hour)::date

注意,你要减去18小时(用于当天开始),并将其转换为date以获取单个值。

示例查询

要获取每天从18点开始的第一行和第二行的row_number。(其余部分不重要)

with dt as (
select 'q1' queue, date'2023-01-01' + n * interval '3' hour ts from generate_series(1,20) t(n) union all
select 'q2' queue, date'2023-01-01' + n * interval '5' hour ts from generate_series(1,20) t(n)
)
select 
  queue, ts,
  row_number() over (partition by queue, (ts - interval'18' hour)::date  order by ts) as rn,
  (ts - interval'18' hour)::date 
from dt 

队列|时间戳 |行数|日期 |
-----+-------------------+--+----------+
q1 |2023-01-01 03:00:00| 1|2022-12-31|
q1 |2023-01-01 06:00:00| 2|2022-12-31|
q1 |2023-01-01 09:00:00| 3|2022-12-31|
...
q1 |2023-01-01 18:00:00| 1|2023-01-01|
q1 |2023-01-01 21:00:00| 2|2023-01-01|
q1 |2023-01-02 00:00:00| 3|2023-01-01|
...
q2 |2023-01-01 05:00:00| 1|2022-12-31|
q2 |2023-01-01 10:00:00| 2|2022-12-31|
q2 |2023-01-01 15:00:00| 3|2022-12-31|
q2 |2023-01-01 20:00:00| 1|2023-01-01|
q2 |2023-01-02 01:00:00| 2|2023-01-01|
q2 |2023-01-02 06:00:00| 3|2023-01-01|
...
q2 |2023-01-02 21:00:00| 1|2023-01-02|
q2 |2023-01-03 02:00:00| 2|2023-01-02|
q2 |2023-01-03 07:00:00| 3|2023-01-02|

英文:

If ts is your timestampyou should partition on the following expression

 partition by queue, (ts - interval'18' hour)::date

Note that you subtract the 18 hours (for the day start) and cast to date to get a single value.

Sample Query

To get the row_number of the first and second row per day starting at 18 hour. (leaving the rest out as not important)

with dt as (
select 'q1' queue, date'2023-01-01' + n * interval '3' hour ts from generate_series(1,20) t(n) union all
select 'q2' queue, date'2023-01-01' + n * interval '5' hour ts from generate_series(1,20) t(n)
)
select 
  queue, ts,
  row_number() over (partition by queue, (ts - interval'18' hour)::date  order by ts) as rn,
  (ts - interval'18' hour)::date 
from dt 

queue|ts                 |rn|date      |
-----+-------------------+--+----------+
q1   |2023-01-01 03:00:00| 1|2022-12-31|
q1   |2023-01-01 06:00:00| 2|2022-12-31|
q1   |2023-01-01 09:00:00| 3|2022-12-31|
...
q1   |2023-01-01 18:00:00| 1|2023-01-01|
q1   |2023-01-01 21:00:00| 2|2023-01-01|
q1   |2023-01-02 00:00:00| 3|2023-01-01|
...
q2   |2023-01-01 05:00:00| 1|2022-12-31|
q2   |2023-01-01 10:00:00| 2|2022-12-31|
q2   |2023-01-01 15:00:00| 3|2022-12-31|
q2   |2023-01-01 20:00:00| 1|2023-01-01|
q2   |2023-01-02 01:00:00| 2|2023-01-01|
q2   |2023-01-02 06:00:00| 3|2023-01-01|
...
q2   |2023-01-02 21:00:00| 1|2023-01-02|
q2   |2023-01-03 02:00:00| 2|2023-01-02|
q2   |2023-01-03 07:00:00| 3|2023-01-02|
...

huangapple
  • 本文由 发表于 2023年4月17日 22:05:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76036035.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定