如何在 SQL 表中对连续的时间戳进行分组?

huangapple go评论51阅读模式
英文:

How to group consecutive timestamps in an SQL table?

问题

以下是一个可以实现此目标的SQL查询:

WITH RankedTimestamps AS (
  SELECT
    Timestamps,
    LAG(Timestamps) OVER (ORDER BY Timestamps) AS PrevTimestamp
  FROM YourTableName
)

SELECT
  MIN(Timestamps) AS "From",
  MAX(Timestamps) AS "To"
FROM (
  SELECT
    Timestamps,
    Timestamps - ROW_NUMBER() OVER (ORDER BY Timestamps) AS Grp
  FROM RankedTimestamps
) AS Groups
GROUP BY Grp
ORDER BY MIN(Timestamps);

请将 YourTableName 替换为实际的表名。这个查询会根据时间戳的连续性,将它们分组成所需的日期范围,并输出结果。

英文:

I have an SQL table in SQL Server with timestamps that looks like this:

  │ Timestamps
──┼────────────────────
1 │ 2022-09-23 15:01:00
2 │ 2022-09-23 15:02:00
3 │ 2022-10-03 14:52:00
4 │ 2022-10-03 14:53:00
5 │ 2022-10-03 14:54:00
6 │ 2022-10-03 14:56:00
7 │ 2022-10-03 14:57:00
8 │ 2022-10-03 14:58:00
9 │ 2022-10-03 14:59:00

I want to extract all consecutive date ranges from the table, where the difference between each timestamp in the range is only one minute. This is the desired result:

  │ From                │ To
──┼─────────────────────┼─────────────────────
1 │ 2022-09-23 15:01:00 │ 2022-09-23 15:02:00
2 │ 2022-10-03 14:52:00 │ 2022-10-03 14:54:00
3 │ 2022-10-03 14:56:00 │ 2022-10-03 14:59:00

Note that e.g. rows 3, 4 and 5 are grouped into one row, because the timestamps 2022-10-03 14:52, 2022-10-03 14:53 and 2022-10-03 14:54 are consecutive. The remaining timestamps from 2022-10-03 14:56 to 2022-10-03 14:59 get grouped into its own range, because there is a gap between the ranges (where 2022-10-03 14:55 would have been).

What is an SQL query that achieves this?

答案1

得分: 3

这是一个关于“间隙和群岛”的问题,使用第一个cte来使用窗口函数LEAD()找到连续行之间的差异,使用第二个cte来找到每个连续行的组ID:

with cte as (
    select *, lead(Timestamps) over (order by Timestamps) as lead,
              case when 
              datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
    from mytable
),
cte2 as (
  select *, sum(diff) over(order by Timestamps) as grp
  from cte
)
select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
from cte2
where lead is not null
group by grp

结果:

range_id	From	                To
1	        2022-09-23 15:01:00.000	2022-09-23 15:02:00.000
2	        2022-09-23 15:02:00.000	2022-10-03 14:54:00.000
3	        2022-10-03 14:54:00.000	2022-10-03 14:59:00.000

演示链接

英文:

This is a gaps and islands issue,

Using the first cte to find differences between consecutive rows using the window function LEAD() and the second cte to find the group id for each consecutives rows:

with cte as (
    select *, lead(Timestamps) over (order by Timestamps) as lead,
              case when 
              datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
    from mytable
),
cte2 as (
  select *, sum(diff) over(order by Timestamps) as grp
  from cte
)
select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
from cte2
where lead is not null
group by grp

Result :

range_id	From	                To
1	        2022-09-23 15:01:00.000	2022-09-23 15:02:00.000
2	        2022-09-23 15:02:00.000	2022-10-03 14:54:00.000
3	        2022-10-03 14:54:00.000	2022-10-03 14:59:00.000

Demo here

答案2

得分: 2

这似乎是一个典型的间隙和岛屿问题。

示例

选择 TS1 = 最小([时间戳])
          ,TS2 = 最大([时间戳])
  从 ( 
    	选择 *
    			  ,Grp = 行号() over( 按[时间戳]顺序)
    				   - 分钟差(0,[时间戳])
    		 从 你的表
          ) A
     按 Grp 分组

结果

TS1                     	TS2
2022-10-03 14:56:00.000 	2022-10-03 14:59:00.000
2022-10-03 14:52:00.000 	2022-10-03 14:54:00.000
2022-09-23 15:01:00.000 	2022-09-23 15:02:00.000
英文:

This seems to be a classic Gaps-and-Islands.

Example

Select TS1 = min([timestamps])
      ,TS2 = max([timestamps])
 From ( 
		Select *
			  ,Grp = row_number() over( order by [timestamps])
				   - datediff(minute,0,[timestamps])
		 From YourTable
      ) A
 Group By Grp

Results

TS1                     	TS2
2022-10-03 14:56:00.000 	2022-10-03 14:59:00.000
2022-10-03 14:52:00.000 	2022-10-03 14:54:00.000
2022-09-23 15:01:00.000 	2022-09-23 15:02:00.000

答案3

得分: 0

你也可以使用“sessionization”来实现这一点,我觉得这更易读:只要两个时间戳之间的间隔不是1分钟,就创建一个新的会话ID。这是一个两步查询,其中一个步骤在每次不是一分钟时将计数器设置为1,另一个步骤在其周围获取该计数器的累积总数:

WITH
-- your input ...
indata(id,ts) AS (
          SELECT 1,{ts '2022-09-23 15:01:00'}
UNION ALL SELECT 2,{ts '2022-09-23 15:02:00'}
UNION ALL SELECT 3,{ts '2022-10-03 14:52:00'}
UNION ALL SELECT 4,{ts '2022-10-03 14:53:00'}
UNION ALL SELECT 5,{ts '2022-10-03 14:54:00'}
UNION ALL SELECT 6,{ts '2022-10-03 14:56:00'}
UNION ALL SELECT 7,{ts '2022-10-03 14:57:00'}
UNION ALL SELECT 8,{ts '2022-10-03 14:58:00'}
UNION ALL SELECT 9,{ts '2022-10-03 14:59:00'}
)
-- real query starts here - replace following comma with "WITH" 
,
-- sessionization part 1: counter at 1 if gap > 1 min
sess1 AS (
  SELECT
   *   
  ,CASE
     WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
     ELSE 1
   END AS counter
  FROM indata
)
,
-- get the running sum of the obtained counter above to get a session id
sess2 AS (
  SELECT
    id
  , ts
  , SUM(counter) OVER(ORDER BY ts) AS session_id
  FROM sess1
)
SELECT 
  session_id
, MIN(ts) AS from_ts
, MAX(ts) AS to_ts
FROM sess2
GROUP BY session_id
ORDER BY 1
session_id from_ts to_ts
1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
2 2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
3 2022-10-03 14:56:00.000 2022-10-03 14:59:00.000

fiddle

英文:

You can also use sessionization for that, I find that more readable: Create a new session id as soon as the gap between two timestamps is not 1 min. A two-step query, with a counter set to 1 every time it's not one minute, and a query around it that gets the running sum of that counter:

WITH
-- your input ...
indata(id,ts) AS (
          SELECT 1,{ts '2022-09-23 15:01:00'}
UNION ALL SELECT 2,{ts '2022-09-23 15:02:00'}
UNION ALL SELECT 3,{ts '2022-10-03 14:52:00'}
UNION ALL SELECT 4,{ts '2022-10-03 14:53:00'}
UNION ALL SELECT 5,{ts '2022-10-03 14:54:00'}
UNION ALL SELECT 6,{ts '2022-10-03 14:56:00'}
UNION ALL SELECT 7,{ts '2022-10-03 14:57:00'}
UNION ALL SELECT 8,{ts '2022-10-03 14:58:00'}
UNION ALL SELECT 9,{ts '2022-10-03 14:59:00'}
)
-- real query starts here - replace following comma with "WITH" 
,
-- sessionization part 1: counter at 1 if gap > 1 min
sess1 AS (
  SELECT
   *   
  ,CASE
     WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
     ELSE 1
   END AS counter
  FROM indata
)
,
-- get the running sum of the obtained counter above to get a session id
sess2 AS (
  SELECT
    id
  , ts
  , SUM(counter) OVER(ORDER BY ts) AS session_id
  FROM sess1
)
SELECT 
  session_id
, MIN(ts) AS from_ts
, MAX(ts) AS to_ts
FROM sess2
GROUP BY session_id
ORDER BY 1

session_id from_ts to_ts
1 2022-09-23 15:01:00.000 2022-09-23 15:02:00.000
2 2022-10-03 14:52:00.000 2022-10-03 14:54:00.000
3 2022-10-03 14:56:00.000 2022-10-03 14:59:00.000

fiddle

huangapple
  • 本文由 发表于 2023年5月14日 18:55:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76247098.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定