英文:
How to group consecutive timestamps in an SQL table?
问题
以下是一个可以实现此目标的SQL查询:
WITH RankedTimestamps AS (
SELECT
Timestamps,
LAG(Timestamps) OVER (ORDER BY Timestamps) AS PrevTimestamp
FROM YourTableName
)
SELECT
MIN(Timestamps) AS "From",
MAX(Timestamps) AS "To"
FROM (
SELECT
Timestamps,
Timestamps - ROW_NUMBER() OVER (ORDER BY Timestamps) AS Grp
FROM RankedTimestamps
) AS Groups
GROUP BY Grp
ORDER BY MIN(Timestamps);
请将 YourTableName
替换为实际的表名。这个查询会根据时间戳的连续性,将它们分组成所需的日期范围,并输出结果。
英文:
I have an SQL table in SQL Server with timestamps that looks like this:
│ Timestamps
──┼────────────────────
1 │ 2022-09-23 15:01:00
2 │ 2022-09-23 15:02:00
3 │ 2022-10-03 14:52:00
4 │ 2022-10-03 14:53:00
5 │ 2022-10-03 14:54:00
6 │ 2022-10-03 14:56:00
7 │ 2022-10-03 14:57:00
8 │ 2022-10-03 14:58:00
9 │ 2022-10-03 14:59:00
I want to extract all consecutive date ranges from the table, where the difference between each timestamp in the range is only one minute. This is the desired result:
│ From │ To
──┼─────────────────────┼─────────────────────
1 │ 2022-09-23 15:01:00 │ 2022-09-23 15:02:00
2 │ 2022-10-03 14:52:00 │ 2022-10-03 14:54:00
3 │ 2022-10-03 14:56:00 │ 2022-10-03 14:59:00
Note that e.g. rows 3, 4 and 5 are grouped into one row, because the timestamps 2022-10-03 14:52
, 2022-10-03 14:53
and 2022-10-03 14:54
are consecutive. The remaining timestamps from 2022-10-03 14:56
to 2022-10-03 14:59
get grouped into its own range, because there is a gap between the ranges (where 2022-10-03 14:55
would have been).
What is an SQL query that achieves this?
答案1
得分: 3
这是一个关于“间隙和群岛”的问题,使用第一个cte
来使用窗口函数LEAD()
找到连续行之间的差异,使用第二个cte
来找到每个连续行的组ID:
with cte as (
select *, lead(Timestamps) over (order by Timestamps) as lead,
case when
datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
from mytable
),
cte2 as (
select *, sum(diff) over(order by Timestamps) as grp
from cte
)
select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
from cte2
where lead is not null
group by grp
结果:
range_id From To
1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
2 2022-09-23 15:02:00.000 2022-10-03 14:54:00.000
3 2022-10-03 14:54:00.000 2022-10-03 14:59:00.000
英文:
This is a gaps and islands
issue,
Using the first cte
to find differences between consecutive rows using the window function LEAD()
and the second cte
to find the group id for each consecutives rows:
with cte as (
select *, lead(Timestamps) over (order by Timestamps) as lead,
case when
datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
from mytable
),
cte2 as (
select *, sum(diff) over(order by Timestamps) as grp
from cte
)
select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
from cte2
where lead is not null
group by grp
Result :
range_id From To
1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
2 2022-09-23 15:02:00.000 2022-10-03 14:54:00.000
3 2022-10-03 14:54:00.000 2022-10-03 14:59:00.000
答案2
得分: 2
这似乎是一个典型的间隙和岛屿问题。
示例
选择 TS1 = 最小([时间戳])
,TS2 = 最大([时间戳])
从 (
选择 *
,Grp = 行号() over( 按[时间戳]顺序)
- 分钟差(0,[时间戳])
从 你的表
) A
按 Grp 分组
结果
TS1 TS2
2022-10-03 14:56:00.000 2022-10-03 14:59:00.000
2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
英文:
This seems to be a classic Gaps-and-Islands.
Example
Select TS1 = min([timestamps])
,TS2 = max([timestamps])
From (
Select *
,Grp = row_number() over( order by [timestamps])
- datediff(minute,0,[timestamps])
From YourTable
) A
Group By Grp
Results
TS1 TS2
2022-10-03 14:56:00.000 2022-10-03 14:59:00.000
2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
答案3
得分: 0
你也可以使用“sessionization”来实现这一点,我觉得这更易读:只要两个时间戳之间的间隔不是1分钟,就创建一个新的会话ID。这是一个两步查询,其中一个步骤在每次不是一分钟时将计数器设置为1,另一个步骤在其周围获取该计数器的累积总数:
WITH
-- your input ...
indata(id,ts) AS (
SELECT 1,{ts '2022-09-23 15:01:00'}
UNION ALL SELECT 2,{ts '2022-09-23 15:02:00'}
UNION ALL SELECT 3,{ts '2022-10-03 14:52:00'}
UNION ALL SELECT 4,{ts '2022-10-03 14:53:00'}
UNION ALL SELECT 5,{ts '2022-10-03 14:54:00'}
UNION ALL SELECT 6,{ts '2022-10-03 14:56:00'}
UNION ALL SELECT 7,{ts '2022-10-03 14:57:00'}
UNION ALL SELECT 8,{ts '2022-10-03 14:58:00'}
UNION ALL SELECT 9,{ts '2022-10-03 14:59:00'}
)
-- real query starts here - replace following comma with "WITH"
,
-- sessionization part 1: counter at 1 if gap > 1 min
sess1 AS (
SELECT
*
,CASE
WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
ELSE 1
END AS counter
FROM indata
)
,
-- get the running sum of the obtained counter above to get a session id
sess2 AS (
SELECT
id
, ts
, SUM(counter) OVER(ORDER BY ts) AS session_id
FROM sess1
)
SELECT
session_id
, MIN(ts) AS from_ts
, MAX(ts) AS to_ts
FROM sess2
GROUP BY session_id
ORDER BY 1
session_id | from_ts | to_ts |
---|---|---|
1 | 2022-09-23 15:01:00.000 | 2022-09-23 15:02:00.000 |
2 | 2022-10-03 14:52:00.000 | 2022-10-03 14:54:00.000 |
3 | 2022-10-03 14:56:00.000 | 2022-10-03 14:59:00.000 |
英文:
You can also use sessionization for that, I find that more readable: Create a new session id as soon as the gap between two timestamps is not 1 min. A two-step query, with a counter set to 1 every time it's not one minute, and a query around it that gets the running sum of that counter:
WITH
-- your input ...
indata(id,ts) AS (
SELECT 1,{ts '2022-09-23 15:01:00'}
UNION ALL SELECT 2,{ts '2022-09-23 15:02:00'}
UNION ALL SELECT 3,{ts '2022-10-03 14:52:00'}
UNION ALL SELECT 4,{ts '2022-10-03 14:53:00'}
UNION ALL SELECT 5,{ts '2022-10-03 14:54:00'}
UNION ALL SELECT 6,{ts '2022-10-03 14:56:00'}
UNION ALL SELECT 7,{ts '2022-10-03 14:57:00'}
UNION ALL SELECT 8,{ts '2022-10-03 14:58:00'}
UNION ALL SELECT 9,{ts '2022-10-03 14:59:00'}
)
-- real query starts here - replace following comma with "WITH"
,
-- sessionization part 1: counter at 1 if gap > 1 min
sess1 AS (
SELECT
*
,CASE
WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
ELSE 1
END AS counter
FROM indata
)
,
-- get the running sum of the obtained counter above to get a session id
sess2 AS (
SELECT
id
, ts
, SUM(counter) OVER(ORDER BY ts) AS session_id
FROM sess1
)
SELECT
session_id
, MIN(ts) AS from_ts
, MAX(ts) AS to_ts
FROM sess2
GROUP BY session_id
ORDER BY 1
session_id | from_ts | to_ts |
---|---|---|
1 | 2022-09-23 15:01:00.000 | 2022-09-23 15:02:00.000 |
2 | 2022-10-03 14:52:00.000 | 2022-10-03 14:54:00.000 |
3 | 2022-10-03 14:56:00.000 | 2022-10-03 14:59:00.000 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论