Find SCD-2 timerange overlaps when a sensor belongs to multiple labels simultaneously in SQL (Postgres)

huangapple go评论103阅读模式
英文:

Find SCD-2 timerange overlaps when a sensor belongs to multiple labels simultaneously in SQL (Postgres)

问题

我有一个表示不同传感器与不同标签之间时间关联的PostgreSQL表,使用SCD-2方法。表结构如下:

  1. CREATE TABLE SensorLabel (
  2. sensor_id INT,
  3. label_id INT,
  4. start_time TIMESTAMPTZ,
  5. end_time TIMESTAMPTZ
  6. );

这个表中的每一行表示传感器(sensor_id)在特定时间段(从start_time到end_time)与一个标签(label_id)关联(包括开始时间,不包括结束时间)。

现在我有一个问题,我需要找到一组标签的所有时间范围重叠部分。也就是说,我有一组标签{label1,label2,...,labelN},我想找到传感器与每个标签同时关联的所有时间范围。

请注意,每个标签关联的时间范围可能不同,因此这些范围的交集可能分解为较小的时间范围。我只想返回传感器与所有给定标签关联的那些时间范围。

我可以解决例如静态的2个标签或3个标签的情况,但在通用SQL中解决可变数量N个标签的问题有困难。

编辑:一些示例输入和输出:

  1. sensor|label|from|to
  2. 1|1|2021-01-01|2021-10-01
  3. 1|2|2020-12-01|2021-05-01
  4. 1|2|2021-07-01|2021-09-01
  5. 1|3|2021-03-01|2021-06-01
  6. 1|3|2021-08-01|2021-12-01

输出:即标签1,2,3的(时间范围的)重叠时间范围:

  1. sensor|from|to
  2. 1|2021-03-01|2021-05-01
  3. 1|2021-08-01|2021-09-01
英文:

I have a PostgreSQL table that represents the time-bound associations of different sensors with different labels, using a SCD-2 approach. The table structure is as follows:

  1. CREATE TABLE SensorLabel (
  2. sensor_id INT,
  3. label_id INT,
  4. start_time TIMESTAMPTZ,
  5. end_time TIMESTAMPTZ
  6. );

Each row in this table represents that a sensor (sensor_id) is associated with a label (label_id) during a specific time period [start_time to end_time) (so, resp. inclusive, exclusive).

Now I have a problem where I need to find all time range overlaps for a set of labels. That is, I have a set of labels {label1, label2, ..., labelN} and I want to find all time ranges when the sensor was associated with each of these labels at the same time.

Please note that the time range for each label association could be different, so the intersection of these ranges may break up into smaller time-ranges. I want to return only those time-ranges during which the sensor is associated with all the given labels.

I can solve for say a static 2 labels or 3 labels but have problems wrapping this in a generic SQL that solves for a variable N labels

EDIT: some sample input and output

  1. sensor|label|from|to
  2. 1|1|2021-01-01|2021-10-01
  3. 1|2|2020-12-01|2021-05-01
  4. 1|2|2021-07-01|2021-09-01
  5. 1|3|2021-03-01|2021-06-01
  6. 1|3|2021-08-01|2021-12-01

Ouput: i.e.: time-ranges where (time-ranges for) label 1,2,3, overlap:

  1. sensor|from|to
  2. 1|2021-03-01|2021-05-01
  3. 1|2021-08-01|2021-09-01

答案1

得分: 1

以下是您要翻译的内容:

"我唯一能够使用函数完成的方式是:"

  1. CREATE TABLE sensorlabel (
  2. sensor_id INT,
  3. label_id INT,
  4. start_time TIMESTAMPTZ,
  5. end_time TIMESTAMPTZ
  6. );
  7. insert into sensorlabel values
  8. (1,1,'2021-01-01','2021-10-01'),
  9. (1,2,'2020-12-01','2021-05-01'),
  10. (1,2,'2021-07-01','2021-09-01'),
  11. (1,3,'2021-03-01','2021-06-01'),
  12. (1,3,'2021-08-01','2021-12-01');
  13. CREATE OR REPLACE FUNCTION public.tsrange_fnc(l_ids integer[], s_id integer)
  14. RETURNS SETOF tstzrange
  15. LANGUAGE plpgsql
  16. AS $function$
  17. DECLARE
  18. _tsrange tstzrange;
  19. _prevrange tstzrange := NULL;
  20. _testrange tstzrange;
  21. BEGIN
  22. for _tsrange in select
  23. tstzrange(start_time, end_time)
  24. from
  25. sensorlabel
  26. where
  27. label_id = ANY(l_ids)
  28. and
  29. sensor_id = s_id
  30. order by
  31. start_time LOOP
  32. IF _prevrange IS NULL THEN
  33. _prevrange = _tsrange;
  34. ELSE
  35. _testrange = _tsrange * _prevrange;
  36. IF _testrange = 'empty'::tstzrange THEN
  37. RETURN NEXT _prevrange;
  38. _prevrange = _tsrange;
  39. ELSE
  40. _prevrange = _tsrange * _prevrange;
  41. END IF;
  42. END IF;
  43. END LOOP;
  44. RETURN NEXT _prevrange;
  45. END;
  46. $function$
  47. select tsrange_fnc(ARRAY[1, 2, 3], 1);
  48. tsrange_fnc
  49. -------------------------------------------------------
  50. ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
  51. ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
  52. select tsrange_fnc(ARRAY[1, 2], 1);
  53. tsrange_fnc
  54. -------------------------------------------------------
  55. ["01/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
  56. ["07/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
  57. select tsrange_fnc(ARRAY[2, 3], 1);
  58. tsrange_fnc
  59. -------------------------------------------------------
  60. ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
  61. ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
英文:

The only way I could do it is with a function:

  1. CREATE TABLE sensorlabel (
  2. sensor_id INT,
  3. label_id INT,
  4. start_time TIMESTAMPTZ,
  5. end_time TIMESTAMPTZ
  6. );
  7. insert into sensorlabel values
  8. (1,1,'2021-01-01','2021-10-01'),
  9. (1,2,'2020-12-01','2021-05-01'),
  10. (1,2,'2021-07-01','2021-09-01'),
  11. (1,3,'2021-03-01','2021-06-01'),
  12. (1,3,'2021-08-01','2021-12-01');
  13. CREATE OR REPLACE FUNCTION public.tsrange_fnc(l_ids integer[], s_id integer)
  14. RETURNS SETOF tstzrange
  15. LANGUAGE plpgsql
  16. AS $function$
  17. DECLARE
  18. _tsrange tstzrange;
  19. _prevrange tstzrange := NULL;
  20. _testrange tstzrange;
  21. BEGIN
  22. for _tsrange in select
  23. tstzrange(start_time, end_time)
  24. from
  25. sensorlabel
  26. where
  27. label_id = ANY(l_ids)
  28. and
  29. sensor_id = s_id
  30. order by
  31. start_time LOOP
  32. IF _prevrange IS NULL THEN
  33. _prevrange = _tsrange;
  34. ELSE
  35. _testrange = _tsrange * _prevrange;
  36. IF _testrange = 'empty'::tstzrange THEN
  37. RETURN NEXT _prevrange;
  38. _prevrange = _tsrange;
  39. ELSE
  40. _prevrange = _tsrange * _prevrange;
  41. END IF;
  42. END IF;
  43. END LOOP;
  44. RETURN NEXT _prevrange;
  45. END;
  46. $function$
  47. select tsrange_fnc(ARRAY[1, 2, 3], 1);
  48. tsrange_fnc
  49. -------------------------------------------------------
  50. ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
  51. ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
  52. select tsrange_fnc(ARRAY[1, 2], 1);
  53. tsrange_fnc
  54. -------------------------------------------------------
  55. ["01/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
  56. ["07/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
  57. select tsrange_fnc(ARRAY[2, 3], 1);
  58. tsrange_fnc
  59. -------------------------------------------------------
  60. ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
  61. ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
  62. </details>
  63. # 答案2
  64. **得分**: 0
  65. 使用递归查询可以找到重叠的传感器标签范围:
  66. ```SQL
  67. WITH RECURSIVE
  68. sensor_labels(sensor_id, label_id, start_time, end_time) AS (
  69. VALUES (1, 1, '2021-01-01'::timestamp, '2021-10-01'::timestamp),
  70. (1, 2, '2020-12-01'::timestamp, '2021-05-01'::timestamp),
  71. (1, 2, '2021-07-01'::timestamp, '2021-09-01'::timestamp),
  72. (1, 3, '2021-03-01'::timestamp, '2021-06-01'::timestamp),
  73. (1, 3, '2021-08-01'::timestamp, '2021-12-01'::timestamp)),
  74. labels AS (
  75. SELECT l.label_id, row_number() OVER (ORDER BY l.label_id) AS n
  76. FROM (VALUES (1), (2), (3)) l(label_id)),
  77. cte AS (
  78. SELECT sensor_labels.sensor_id,
  79. greatest(coalesce(time_span.start_time, sensor_labels.start_time), sensor_labels.start_time) AS start_time,
  80. least(coalesce(time_span.end_time, sensor_labels.end_time), sensor_labels.end_time) AS end_time,
  81. labels.n
  82. FROM labels
  83. JOIN sensor_labels
  84. USING (label_id)
  85. JOIN (VALUES (NULL::timestamp, NULL::timestamp)) time_span(start_time, end_time)
  86. ON (time_span.end_time IS NULL OR sensor_labels.start_time < time_span.end_time) AND
  87. (time_span.start_time IS NULL OR sensor_labels IS NULL OR time_span.start_time < sensor_labels.end_time)
  88. WHERE labels.n = 1
  89. UNION ALL
  90. SELECT sensor_labels.sensor_id,
  91. greatest(cte.start_time, sensor_labels.start_time) AS start_time,
  92. least(cte.end_time, sensor_labels.end_time) AS end_time,
  93. labels.n
  94. FROM labels
  95. JOIN sensor_labels
  96. USING (label_id)
  97. JOIN cte
  98. ON labels.n = cte.n + 1 AND
  99. (cte.end_time IS NULL OR sensor_labels.start_time < cte.end_time) AND
  100. (sensor_labels.end_time IS NULL OR cte.start_time < sensor_labels.end_time))
  101. SELECT cte.sensor_id, cte.start_time, cte.end_time
  102. FROM cte
  103. WHERE cte.n = (SELECT max(labels.n) FROM labels)
  104. ORDER BY cte.start_time, cte.sensor_id;

这个查询通过迭代标签来细化重叠的时间段。该查询假设 start_time/end_time 对定义了封闭-开放的时间间隔,sensor_labels 具有非空的 start_time,而 NULL 的 end_time 意味着一直到时间结束。CTE 的初始化部分包括 time_span,可用于有效地限制所选时间范围。对于任一边界的 NULL 值表示相应的边界没有约束。

英文:

Overlapping sensor label ranges can be found using a recursive query:

  1. WITH RECURSIVE
  2. sensor_labels(sensor_id, label_id, start_time, end_time) AS (
  3. VALUES (1, 1, &#39;2021-01-01&#39;::timestamp, &#39;2021-10-01&#39;::timestamp),
  4. (1, 2, &#39;2020-12-01&#39;::timestamp, &#39;2021-05-01&#39;::timestamp),
  5. (1, 2, &#39;2021-07-01&#39;::timestamp, &#39;2021-09-01&#39;::timestamp),
  6. (1, 3, &#39;2021-03-01&#39;::timestamp, &#39;2021-06-01&#39;::timestamp),
  7. (1, 3, &#39;2021-08-01&#39;::timestamp, &#39;2021-12-01&#39;::timestamp)),
  8. labels AS (
  9. SELECT l.label_id, row_number() OVER (ORDER BY l.label_id) AS n
  10. FROM (VALUES (1), (2), (3)) l(label_id)),
  11. cte AS (
  12. SELECT sensor_labels.sensor_id,
  13. greatest(coalesce(time_span.start_time, sensor_labels.start_time), sensor_labels.start_time) AS start_time,
  14. least(coalesce(time_span.end_time, sensor_labels.end_time), sensor_labels.end_time) AS end_time,
  15. labels.n
  16. FROM labels
  17. JOIN sensor_labels
  18. USING (label_id)
  19. JOIN (VALUES (NULL::timestamp, NULL::timestamp)) time_span(start_time, end_time)
  20. ON (time_span.end_time IS NULL OR sensor_labels.start_time &lt; time_span.end_time) AND
  21. (time_span.start_time IS NULL OR sensor_labels IS NULL OR time_span.start_time &lt; sensor_labels.end_time)
  22. WHERE labels.n = 1
  23. UNION ALL
  24. SELECT sensor_labels.sensor_id,
  25. greatest(cte.start_time, sensor_labels.start_time) AS start_time,
  26. least(cte.end_time, sensor_labels.end_time) AS end_time,
  27. labels.n
  28. FROM labels
  29. JOIN sensor_labels
  30. USING (label_id)
  31. JOIN cte
  32. ON labels.n = cte.n + 1 AND
  33. (cte.end_time IS NULL OR sensor_labels.start_time &lt; cte.end_time) AND
  34. (sensor_labels.end_time IS NULL OR cte.start_time &lt; sensor_labels.end_time))
  35. SELECT cte.sensor_id, cte.start_time, cte.end_time
  36. FROM cte
  37. WHERE cte.n = (SELECT max(labels.n) FROM labels)
  38. ORDER BY cte.start_time, cte.sensor_id;

This query works by iterating over the labels to refine the overlapping times. This query assumes that start_time/end_time pairs define closed-open intervals, sensor_labels have non-NULL start_time, and NULL end_time means until the end of time. The initialization portion of the CTE includes time_span which can be used to efficiently limit the selected range of times. A NULL for either indicates that the corresponding bound is unconstrained.

huangapple
  • 本文由 发表于 2023年6月5日 01:59:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76401731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定