如何在 SQL 表中对连续的时间戳进行分组?

huangapple go评论82阅读模式
英文:

How to group consecutive timestamps in an SQL table?

问题

以下是一个可以实现此目标的SQL查询:

  1. WITH RankedTimestamps AS (
  2. SELECT
  3. Timestamps,
  4. LAG(Timestamps) OVER (ORDER BY Timestamps) AS PrevTimestamp
  5. FROM YourTableName
  6. )
  7. SELECT
  8. MIN(Timestamps) AS "From",
  9. MAX(Timestamps) AS "To"
  10. FROM (
  11. SELECT
  12. Timestamps,
  13. Timestamps - ROW_NUMBER() OVER (ORDER BY Timestamps) AS Grp
  14. FROM RankedTimestamps
  15. ) AS Groups
  16. GROUP BY Grp
  17. ORDER BY MIN(Timestamps);

请将 YourTableName 替换为实际的表名。这个查询会根据时间戳的连续性,将它们分组成所需的日期范围,并输出结果。

英文:

I have an SQL table in SQL Server with timestamps that looks like this:

  1. Timestamps
  2. ──┼────────────────────
  3. 1 2022-09-23 15:01:00
  4. 2 2022-09-23 15:02:00
  5. 3 2022-10-03 14:52:00
  6. 4 2022-10-03 14:53:00
  7. 5 2022-10-03 14:54:00
  8. 6 2022-10-03 14:56:00
  9. 7 2022-10-03 14:57:00
  10. 8 2022-10-03 14:58:00
  11. 9 2022-10-03 14:59:00

I want to extract all consecutive date ranges from the table, where the difference between each timestamp in the range is only one minute. This is the desired result:

  1. From To
  2. ──┼─────────────────────┼─────────────────────
  3. 1 2022-09-23 15:01:00 2022-09-23 15:02:00
  4. 2 2022-10-03 14:52:00 2022-10-03 14:54:00
  5. 3 2022-10-03 14:56:00 2022-10-03 14:59:00

Note that e.g. rows 3, 4 and 5 are grouped into one row, because the timestamps 2022-10-03 14:52, 2022-10-03 14:53 and 2022-10-03 14:54 are consecutive. The remaining timestamps from 2022-10-03 14:56 to 2022-10-03 14:59 get grouped into its own range, because there is a gap between the ranges (where 2022-10-03 14:55 would have been).

What is an SQL query that achieves this?

答案1

得分: 3

这是一个关于“间隙和群岛”的问题,使用第一个cte来使用窗口函数LEAD()找到连续行之间的差异,使用第二个cte来找到每个连续行的组ID:

  1. with cte as (
  2. select *, lead(Timestamps) over (order by Timestamps) as lead,
  3. case when
  4. datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
  5. from mytable
  6. ),
  7. cte2 as (
  8. select *, sum(diff) over(order by Timestamps) as grp
  9. from cte
  10. )
  11. select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
  12. from cte2
  13. where lead is not null
  14. group by grp

结果:

  1. range_id From To
  2. 1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
  3. 2 2022-09-23 15:02:00.000 2022-10-03 14:54:00.000
  4. 3 2022-10-03 14:54:00.000 2022-10-03 14:59:00.000

演示链接

英文:

This is a gaps and islands issue,

Using the first cte to find differences between consecutive rows using the window function LEAD() and the second cte to find the group id for each consecutives rows:

  1. with cte as (
  2. select *, lead(Timestamps) over (order by Timestamps) as lead,
  3. case when
  4. datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
  5. from mytable
  6. ),
  7. cte2 as (
  8. select *, sum(diff) over(order by Timestamps) as grp
  9. from cte
  10. )
  11. select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
  12. from cte2
  13. where lead is not null
  14. group by grp

Result :

  1. range_id From To
  2. 1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
  3. 2 2022-09-23 15:02:00.000 2022-10-03 14:54:00.000
  4. 3 2022-10-03 14:54:00.000 2022-10-03 14:59:00.000

Demo here

答案2

得分: 2

这似乎是一个典型的间隙和岛屿问题。

示例

  1. 选择 TS1 = 最小([时间戳])
  2. ,TS2 = 最大([时间戳])
  3. (
  4. 选择 *
  5. ,Grp = 行号() over( 按[时间戳]顺序)
  6. - 分钟差(0,[时间戳])
  7. 你的表
  8. ) A
  9. Grp 分组

结果

  1. TS1 TS2
  2. 2022-10-03 14:56:00.000 2022-10-03 14:59:00.000
  3. 2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
  4. 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
英文:

This seems to be a classic Gaps-and-Islands.

Example

  1. Select TS1 = min([timestamps])
  2. ,TS2 = max([timestamps])
  3. From (
  4. Select *
  5. ,Grp = row_number() over( order by [timestamps])
  6. - datediff(minute,0,[timestamps])
  7. From YourTable
  8. ) A
  9. Group By Grp

Results

  1. TS1 TS2
  2. 2022-10-03 14:56:00.000 2022-10-03 14:59:00.000
  3. 2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
  4. 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000

答案3

得分: 0

你也可以使用“sessionization”来实现这一点,我觉得这更易读:只要两个时间戳之间的间隔不是1分钟,就创建一个新的会话ID。这是一个两步查询,其中一个步骤在每次不是一分钟时将计数器设置为1,另一个步骤在其周围获取该计数器的累积总数:

  1. WITH
  2. -- your input ...
  3. indata(id,ts) AS (
  4. SELECT 1,{ts '2022-09-23 15:01:00'}
  5. UNION ALL SELECT 2,{ts '2022-09-23 15:02:00'}
  6. UNION ALL SELECT 3,{ts '2022-10-03 14:52:00'}
  7. UNION ALL SELECT 4,{ts '2022-10-03 14:53:00'}
  8. UNION ALL SELECT 5,{ts '2022-10-03 14:54:00'}
  9. UNION ALL SELECT 6,{ts '2022-10-03 14:56:00'}
  10. UNION ALL SELECT 7,{ts '2022-10-03 14:57:00'}
  11. UNION ALL SELECT 8,{ts '2022-10-03 14:58:00'}
  12. UNION ALL SELECT 9,{ts '2022-10-03 14:59:00'}
  13. )
  14. -- real query starts here - replace following comma with "WITH"
  15. ,
  16. -- sessionization part 1: counter at 1 if gap > 1 min
  17. sess1 AS (
  18. SELECT
  19. *
  20. ,CASE
  21. WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
  22. ELSE 1
  23. END AS counter
  24. FROM indata
  25. )
  26. ,
  27. -- get the running sum of the obtained counter above to get a session id
  28. sess2 AS (
  29. SELECT
  30. id
  31. , ts
  32. , SUM(counter) OVER(ORDER BY ts) AS session_id
  33. FROM sess1
  34. )
  35. SELECT
  36. session_id
  37. , MIN(ts) AS from_ts
  38. , MAX(ts) AS to_ts
  39. FROM sess2
  40. GROUP BY session_id
  41. ORDER BY 1
session_id from_ts to_ts
1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
2 2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
3 2022-10-03 14:56:00.000 2022-10-03 14:59:00.000

fiddle

英文:

You can also use sessionization for that, I find that more readable: Create a new session id as soon as the gap between two timestamps is not 1 min. A two-step query, with a counter set to 1 every time it's not one minute, and a query around it that gets the running sum of that counter:

  1. WITH
  2. -- your input ...
  3. indata(id,ts) AS (
  4. SELECT 1,{ts '2022-09-23 15:01:00'}
  5. UNION ALL SELECT 2,{ts '2022-09-23 15:02:00'}
  6. UNION ALL SELECT 3,{ts '2022-10-03 14:52:00'}
  7. UNION ALL SELECT 4,{ts '2022-10-03 14:53:00'}
  8. UNION ALL SELECT 5,{ts '2022-10-03 14:54:00'}
  9. UNION ALL SELECT 6,{ts '2022-10-03 14:56:00'}
  10. UNION ALL SELECT 7,{ts '2022-10-03 14:57:00'}
  11. UNION ALL SELECT 8,{ts '2022-10-03 14:58:00'}
  12. UNION ALL SELECT 9,{ts '2022-10-03 14:59:00'}
  13. )
  14. -- real query starts here - replace following comma with "WITH"
  15. ,
  16. -- sessionization part 1: counter at 1 if gap > 1 min
  17. sess1 AS (
  18. SELECT
  19. *
  20. ,CASE
  21. WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
  22. ELSE 1
  23. END AS counter
  24. FROM indata
  25. )
  26. ,
  27. -- get the running sum of the obtained counter above to get a session id
  28. sess2 AS (
  29. SELECT
  30. id
  31. , ts
  32. , SUM(counter) OVER(ORDER BY ts) AS session_id
  33. FROM sess1
  34. )
  35. SELECT
  36. session_id
  37. , MIN(ts) AS from_ts
  38. , MAX(ts) AS to_ts
  39. FROM sess2
  40. GROUP BY session_id
  41. ORDER BY 1
session_id from_ts to_ts
1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
2 2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
3 2022-10-03 14:56:00.000 2022-10-03 14:59:00.000

fiddle

huangapple
  • 本文由 发表于 2023年5月14日 18:55:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76247098.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定