2023年5月14日 18:55:56go评论84阅读模式

英文:

How to group consecutive timestamps in an SQL table?

问题

以下是一个可以实现此目标的SQL查询：

WITH RankedTimestamps AS (
  SELECT
    Timestamps,
    LAG(Timestamps) OVER (ORDER BY Timestamps) AS PrevTimestamp
  FROM YourTableName
)
SELECT
  MIN(Timestamps) AS "From",
  MAX(Timestamps) AS "To"
FROM (
  SELECT
    Timestamps,
    Timestamps - ROW_NUMBER() OVER (ORDER BY Timestamps) AS Grp
  FROM RankedTimestamps
) AS Groups
GROUP BY Grp
ORDER BY MIN(Timestamps);

请将 YourTableName 替换为实际的表名。这个查询会根据时间戳的连续性，将它们分组成所需的日期范围，并输出结果。

英文:

I have an SQL table in SQL Server with timestamps that looks like this:

  │ Timestamps
──┼────────────────────
1 │ 2022-09-23 15:01:00
2 │ 2022-09-23 15:02:00
3 │ 2022-10-03 14:52:00
4 │ 2022-10-03 14:53:00
5 │ 2022-10-03 14:54:00
6 │ 2022-10-03 14:56:00
7 │ 2022-10-03 14:57:00
8 │ 2022-10-03 14:58:00
9 │ 2022-10-03 14:59:00

I want to extract all consecutive date ranges from the table, where the difference between each timestamp in the range is only one minute. This is the desired result:

  │ From                │ To
──┼─────────────────────┼─────────────────────
1 │ 2022-09-23 15:01:00 │ 2022-09-23 15:02:00
2 │ 2022-10-03 14:52:00 │ 2022-10-03 14:54:00
3 │ 2022-10-03 14:56:00 │ 2022-10-03 14:59:00

Note that e.g. rows 3, 4 and 5 are grouped into one row, because the timestamps 2022-10-03 14:52, 2022-10-03 14:53 and 2022-10-03 14:54 are consecutive. The remaining timestamps from 2022-10-03 14:56 to 2022-10-03 14:59 get grouped into its own range, because there is a gap between the ranges (where 2022-10-03 14:55 would have been).

What is an SQL query that achieves this?

答案1

得分: 3

这是一个关于“间隙和群岛”的问题，使用第一个cte来使用窗口函数LEAD()找到连续行之间的差异，使用第二个cte来找到每个连续行的组ID：

with cte as (
    select *, lead(Timestamps) over (order by Timestamps) as lead,
              case when 
              datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
    from mytable
),
cte2 as (
  select *, sum(diff) over(order by Timestamps) as grp
  from cte
)
select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
from cte2
where lead is not null
group by grp

结果：

range_id	From	                To
1	        2022-09-23 15:01:00.000	2022-09-23 15:02:00.000
2	        2022-09-23 15:02:00.000	2022-10-03 14:54:00.000
3	        2022-10-03 14:54:00.000	2022-10-03 14:59:00.000

演示链接

英文:

This is a gaps and islands issue,

Using the first cte to find differences between consecutive rows using the window function LEAD() and the second cte to find the group id for each consecutives rows:

with cte as (
    select *, lead(Timestamps) over (order by Timestamps) as lead,
              case when 
              datediff(mi, Timestamps, LEAD(Timestamps) over (order by Timestamps)) = 1 then 0 else 1 end as diff
    from mytable
),
cte2 as (
  select *, sum(diff) over(order by Timestamps) as grp
  from cte
)
select grp+1 as range_id, min(Timestamps) as [From], max(lead) as [To]
from cte2
where lead is not null
group by grp

Result :

range_id	From	                To
1	        2022-09-23 15:01:00.000	2022-09-23 15:02:00.000
2	        2022-09-23 15:02:00.000	2022-10-03 14:54:00.000
3	        2022-10-03 14:54:00.000	2022-10-03 14:59:00.000

Demo here

答案2

得分: 2

这似乎是一个典型的间隙和岛屿问题。

示例

选择 TS1 = 最小([时间戳])
          ,TS2 = 最大([时间戳])
  从 ( 
    	选择 *
    			  ,Grp = 行号() over( 按[时间戳]顺序)
    				   - 分钟差(0,[时间戳])
    		 从 你的表
          ) A
     按 Grp 分组

结果

TS1                     	TS2
2022-10-03 14:56:00.000 	2022-10-03 14:59:00.000
2022-10-03 14:52:00.000 	2022-10-03 14:54:00.000
2022-09-23 15:01:00.000 	2022-09-23 15:02:00.000

英文:

This seems to be a classic Gaps-and-Islands.

Example

Select TS1 = min([timestamps])
      ,TS2 = max([timestamps])
 From ( 
		Select *
			  ,Grp = row_number() over( order by [timestamps])
				   - datediff(minute,0,[timestamps])
		 From YourTable
      ) A
 Group By Grp

Results

TS1                     	TS2
2022-10-03 14:56:00.000 	2022-10-03 14:59:00.000
2022-10-03 14:52:00.000 	2022-10-03 14:54:00.000
2022-09-23 15:01:00.000 	2022-09-23 15:02:00.000

答案3

得分: 0

你也可以使用“sessionization”来实现这一点，我觉得这更易读：只要两个时间戳之间的间隔不是1分钟，就创建一个新的会话ID。这是一个两步查询，其中一个步骤在每次不是一分钟时将计数器设置为1，另一个步骤在其周围获取该计数器的累积总数：

WITH
-- your input ...
indata(id,ts) AS (
          SELECT 1,{ts '2022-09-23 15:01:00'}
UNION ALL SELECT 2,{ts '2022-09-23 15:02:00'}
UNION ALL SELECT 3,{ts '2022-10-03 14:52:00'}
UNION ALL SELECT 4,{ts '2022-10-03 14:53:00'}
UNION ALL SELECT 5,{ts '2022-10-03 14:54:00'}
UNION ALL SELECT 6,{ts '2022-10-03 14:56:00'}
UNION ALL SELECT 7,{ts '2022-10-03 14:57:00'}
UNION ALL SELECT 8,{ts '2022-10-03 14:58:00'}
UNION ALL SELECT 9,{ts '2022-10-03 14:59:00'}
)
-- real query starts here - replace following comma with "WITH" 
,
-- sessionization part 1: counter at 1 if gap > 1 min
sess1 AS (
  SELECT
   *   
  ,CASE
     WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
     ELSE 1
   END AS counter
  FROM indata
)
,
-- get the running sum of the obtained counter above to get a session id
sess2 AS (
  SELECT
    id
  , ts
  , SUM(counter) OVER(ORDER BY ts) AS session_id
  FROM sess1
)
SELECT 
  session_id
, MIN(ts) AS from_ts
, MAX(ts) AS to_ts
FROM sess2
GROUP BY session_id
ORDER BY 1

session_id	from_ts	to_ts
1	2022-09-23 15:01:00.000	2022-09-23 15:02:00.000
2	2022-10-03 14:52:00.000	2022-10-03 14:54:00.000
3	2022-10-03 14:56:00.000	2022-10-03 14:59:00.000

fiddle

英文:

You can also use sessionization for that, I find that more readable: Create a new session id as soon as the gap between two timestamps is not 1 min. A two-step query, with a counter set to 1 every time it's not one minute, and a query around it that gets the running sum of that counter:

WITH
-- your input ...
indata(id,ts) AS (
          SELECT 1,{ts &#39;2022-09-23 15:01:00&#39;}
UNION ALL SELECT 2,{ts &#39;2022-09-23 15:02:00&#39;}
UNION ALL SELECT 3,{ts &#39;2022-10-03 14:52:00&#39;}
UNION ALL SELECT 4,{ts &#39;2022-10-03 14:53:00&#39;}
UNION ALL SELECT 5,{ts &#39;2022-10-03 14:54:00&#39;}
UNION ALL SELECT 6,{ts &#39;2022-10-03 14:56:00&#39;}
UNION ALL SELECT 7,{ts &#39;2022-10-03 14:57:00&#39;}
UNION ALL SELECT 8,{ts &#39;2022-10-03 14:58:00&#39;}
UNION ALL SELECT 9,{ts &#39;2022-10-03 14:59:00&#39;}
)
-- real query starts here - replace following comma with &quot;WITH&quot; 
,
-- sessionization part 1: counter at 1 if gap &gt; 1 min
sess1 AS (
  SELECT
   *   
  ,CASE
     WHEN DATEDIFF(minute,LAG(ts) OVER(ORDER BY ts), ts) = 1 THEN 0
     ELSE 1
   END AS counter
  FROM indata
)
,
-- get the running sum of the obtained counter above to get a session id
sess2 AS (
  SELECT
    id
  , ts
  , SUM(counter) OVER(ORDER BY ts) AS session_id
  FROM sess1
)
SELECT 
  session_id
, MIN(ts) AS from_ts
, MAX(ts) AS to_ts
FROM sess2
GROUP BY session_id
ORDER BY 1

session_id	from_ts	to_ts
1	2022-09-23 15:01:00.000	2022-09-23 15:02:00.000
2	2022-10-03 14:52:00.000	2022-10-03 14:54:00.000
3	2022-10-03 14:56:00.000	2022-10-03 14:59:00.000

fiddle

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 SQL 表中对连续的时间戳进行分组？

问题

答案1

答案2

答案3

哪个Windows系统属性从SQL Windows搜索中返回实际路径？

除了需要累加的特定值之外，按照GROUP BY进行分组。

没有匹配的签名错误 Google BigQuery

基于函数的索引未能提高查询性能。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。