英文:
Postgres query that ignores the first 2 entries of day per group
问题
我想制作一个忽略每队列每天的前2个条目的PostgreSQL查询,其中“一天”被定义为从18:00开始。
例如,如果我的数据如下:
id | queue | timestamp (bigint) |
---|---|---|
1 | q1 | 1673721000000000000 (2023-01-14 18:30:00) - q1的当天第一个条目 |
2 | q1 | 1673728200000000000 (2023-01-14 20:30:00) |
3 | q1 | 1673760600000000000 (2023-01-15 05:30:00) |
4 | q1 | 1673806000000000000 (2023-01-15 18:10:00) - q1的当天第一个条目 |
5 | q2 | 1673721000000000000 (2023-01-14 18:30:00) - q2的当天第一个条目 |
6 | q2 | 1673728200000000000 (2023-01-14 20:30:00) |
7 | q2 | 1673760600000000000 (2023-01-15 05:30:00) |
8 | q2 | 1673802600000000000 (2023-01-15 17:10:00) |
9 | q2 | 1674067800000000000 (2023-01-18 18:50:00) - q2的当天第一个条目 |
10 | q2 | 1674075000000000000 (2023-01-18 20:50:00) |
11 | q2 | 1674096600000000000 (2023-01-19 02:50:00) |
12 | q2 | 1674132600000000000 (2023-01-19 12:50:00) |
13 | q3 | 1673721000000000000 (2023-01-14 18:30:00) - q3的当天第一个条目 |
14 | q3 | 1673728200000000000 (2023-01-14 20:30:00) |
查询结果应该是id为3、7、8、11和12的条目:
id | queue | timestamp |
---|---|---|
3 | q1 | 1673760600000000000 |
7 | q2 | 1673760600000000000 |
8 | q2 | 1673802600000000000 |
11 | q2 | 1674096600000000000 |
12 | q2 | 1674132600000000000 |
我尝试过使用在分区上的row_number函数,但在筛选方面遇到了问题。
英文:
I want to make a postgres select query that ignores the first 2 entries of day per queue, where a "day" is defined as starting at 18:00
For example, If my data is like:
id | queue | timestamp (bigint) |
---|---|---|
1 | q1 | 1673721000000000000 (2023-01-14 18:30:00) - first entry of day for q1 |
2 | q1 | 1673728200000000000 (2023-01-14 20:30:00) |
3 | q1 | 1673760600000000000 (2023-01-15 05:30:00) |
4 | q1 | 1673806000000000000 (2023-01-15 18:10:00) - first entry of day for q1 |
5 | q2 | 1673721000000000000 (2023-01-14 18:30:00) - first entry of day for q2 |
6 | q2 | 1673728200000000000 (2023-01-14 20:30:00) |
7 | q2 | 1673760600000000000 (2023-01-15 05:30:00) |
8 | q2 | 1673802600000000000 (2023-01-15 17:10:00) |
9 | q2 | 1674067800000000000 (2023-01-18 18:50:00) - first entry of day for q2 |
10 | q2 | 1674075000000000000 (2023-01-18 20:50:00) |
11 | q2 | 1674096600000000000 (2023-01-19 02:50:00) |
12 | q2 | 1674132600000000000 (2023-01-19 12:50:00) |
13 | q3 | 1673721000000000000 (2023-01-14 18:30:00) - first entry of day for q3 |
14 | q3 | 1673728200000000000 (2023-01-14 20:30:00) |
The select results should be ids 3 7 8 11 and 12:
id | queue | timestamp |
---|---|---|
3 | q1 | 1673760600000000000 |
7 | q2 | 1673760600000000000 |
8 | q2 | 1673802600000000000 |
11 | q2 | 1674096600000000000 |
12 | q2 | 1674132600000000000 |
I've tried using select row_number over a partition, but having issues getting the filtering correct.
// of course this doesn't work, since it only ignores first 2 of all time, not per day starting at 18:00
SELECT *
FROM (
SELECT row_number() over (PARTITION by queue ORDER by timestamp) as row_n,
*
FROM mytable
) results
WHERE results.row_n > 2
// gives hourly time of timestamp
CAST((to_timestamp("timestamp"/1000000000)) AS time)
Any help is appreciated. Thanks!
答案1
得分: 1
如果ts
是你的timestamp
,你应该在以下表达式上进行partition
partition by queue, (ts - interval'18' hour)::date
注意,你要减去18小时(用于当天开始),并将其转换为date
以获取单个值。
示例查询
要获取每天从18点开始的第一行和第二行的row_number。(其余部分不重要)
with dt as (
select 'q1' queue, date'2023-01-01' + n * interval '3' hour ts from generate_series(1,20) t(n) union all
select 'q2' queue, date'2023-01-01' + n * interval '5' hour ts from generate_series(1,20) t(n)
)
select
queue, ts,
row_number() over (partition by queue, (ts - interval'18' hour)::date order by ts) as rn,
(ts - interval'18' hour)::date
from dt
队列|时间戳 |行数|日期 |
-----+-------------------+--+----------+
q1 |2023-01-01 03:00:00| 1|2022-12-31|
q1 |2023-01-01 06:00:00| 2|2022-12-31|
q1 |2023-01-01 09:00:00| 3|2022-12-31|
...
q1 |2023-01-01 18:00:00| 1|2023-01-01|
q1 |2023-01-01 21:00:00| 2|2023-01-01|
q1 |2023-01-02 00:00:00| 3|2023-01-01|
...
q2 |2023-01-01 05:00:00| 1|2022-12-31|
q2 |2023-01-01 10:00:00| 2|2022-12-31|
q2 |2023-01-01 15:00:00| 3|2022-12-31|
q2 |2023-01-01 20:00:00| 1|2023-01-01|
q2 |2023-01-02 01:00:00| 2|2023-01-01|
q2 |2023-01-02 06:00:00| 3|2023-01-01|
...
q2 |2023-01-02 21:00:00| 1|2023-01-02|
q2 |2023-01-03 02:00:00| 2|2023-01-02|
q2 |2023-01-03 07:00:00| 3|2023-01-02|
英文:
If ts
is your timestamp
you should partition on the following expression
partition by queue, (ts - interval'18' hour)::date
Note that you subtract the 18 hours (for the day start) and cast to date
to get a single value.
Sample Query
To get the row_number of the first and second row per day starting at 18 hour. (leaving the rest out as not important)
with dt as (
select 'q1' queue, date'2023-01-01' + n * interval '3' hour ts from generate_series(1,20) t(n) union all
select 'q2' queue, date'2023-01-01' + n * interval '5' hour ts from generate_series(1,20) t(n)
)
select
queue, ts,
row_number() over (partition by queue, (ts - interval'18' hour)::date order by ts) as rn,
(ts - interval'18' hour)::date
from dt
queue|ts |rn|date |
-----+-------------------+--+----------+
q1 |2023-01-01 03:00:00| 1|2022-12-31|
q1 |2023-01-01 06:00:00| 2|2022-12-31|
q1 |2023-01-01 09:00:00| 3|2022-12-31|
...
q1 |2023-01-01 18:00:00| 1|2023-01-01|
q1 |2023-01-01 21:00:00| 2|2023-01-01|
q1 |2023-01-02 00:00:00| 3|2023-01-01|
...
q2 |2023-01-01 05:00:00| 1|2022-12-31|
q2 |2023-01-01 10:00:00| 2|2022-12-31|
q2 |2023-01-01 15:00:00| 3|2022-12-31|
q2 |2023-01-01 20:00:00| 1|2023-01-01|
q2 |2023-01-02 01:00:00| 2|2023-01-01|
q2 |2023-01-02 06:00:00| 3|2023-01-01|
...
q2 |2023-01-02 21:00:00| 1|2023-01-02|
q2 |2023-01-03 02:00:00| 2|2023-01-02|
q2 |2023-01-03 07:00:00| 3|2023-01-02|
...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论