Find SCD-2 timerange overlaps when a sensor belongs to multiple labels simultaneously in SQL (Postgres)

huangapple go评论65阅读模式
英文:

Find SCD-2 timerange overlaps when a sensor belongs to multiple labels simultaneously in SQL (Postgres)

问题

我有一个表示不同传感器与不同标签之间时间关联的PostgreSQL表,使用SCD-2方法。表结构如下:

CREATE TABLE SensorLabel (
    sensor_id INT,
    label_id INT,
    start_time TIMESTAMPTZ,
    end_time TIMESTAMPTZ
);

这个表中的每一行表示传感器(sensor_id)在特定时间段(从start_time到end_time)与一个标签(label_id)关联(包括开始时间,不包括结束时间)。

现在我有一个问题,我需要找到一组标签的所有时间范围重叠部分。也就是说,我有一组标签{label1,label2,...,labelN},我想找到传感器与每个标签同时关联的所有时间范围。

请注意,每个标签关联的时间范围可能不同,因此这些范围的交集可能分解为较小的时间范围。我只想返回传感器与所有给定标签关联的那些时间范围。

我可以解决例如静态的2个标签或3个标签的情况,但在通用SQL中解决可变数量N个标签的问题有困难。

编辑:一些示例输入和输出:

sensor|label|from|to
1|1|2021-01-01|2021-10-01
1|2|2020-12-01|2021-05-01
1|2|2021-07-01|2021-09-01
1|3|2021-03-01|2021-06-01
1|3|2021-08-01|2021-12-01

输出:即标签1,2,3的(时间范围的)重叠时间范围:

sensor|from|to
1|2021-03-01|2021-05-01
1|2021-08-01|2021-09-01
英文:

I have a PostgreSQL table that represents the time-bound associations of different sensors with different labels, using a SCD-2 approach. The table structure is as follows:

CREATE TABLE SensorLabel (
    sensor_id INT,
    label_id INT,
    start_time TIMESTAMPTZ,
    end_time TIMESTAMPTZ
);

Each row in this table represents that a sensor (sensor_id) is associated with a label (label_id) during a specific time period [start_time to end_time) (so, resp. inclusive, exclusive).

Now I have a problem where I need to find all time range overlaps for a set of labels. That is, I have a set of labels {label1, label2, ..., labelN} and I want to find all time ranges when the sensor was associated with each of these labels at the same time.

Please note that the time range for each label association could be different, so the intersection of these ranges may break up into smaller time-ranges. I want to return only those time-ranges during which the sensor is associated with all the given labels.

I can solve for say a static 2 labels or 3 labels but have problems wrapping this in a generic SQL that solves for a variable N labels

EDIT: some sample input and output

sensor|label|from|to
1|1|2021-01-01|2021-10-01
1|2|2020-12-01|2021-05-01
1|2|2021-07-01|2021-09-01
1|3|2021-03-01|2021-06-01
1|3|2021-08-01|2021-12-01

Ouput: i.e.: time-ranges where (time-ranges for) label 1,2,3, overlap:

sensor|from|to
1|2021-03-01|2021-05-01
1|2021-08-01|2021-09-01

答案1

得分: 1

以下是您要翻译的内容:

"我唯一能够使用函数完成的方式是:"

CREATE TABLE sensorlabel (
    sensor_id INT,
    label_id INT,
    start_time TIMESTAMPTZ,
    end_time TIMESTAMPTZ
);

insert into sensorlabel values
(1,1,'2021-01-01','2021-10-01'),
(1,2,'2020-12-01','2021-05-01'),
(1,2,'2021-07-01','2021-09-01'),
(1,3,'2021-03-01','2021-06-01'),
(1,3,'2021-08-01','2021-12-01');

CREATE OR REPLACE FUNCTION public.tsrange_fnc(l_ids integer[], s_id integer)
 RETURNS SETOF tstzrange
 LANGUAGE plpgsql
AS $function$
DECLARE

    _tsrange tstzrange;
    _prevrange tstzrange := NULL;
    _testrange tstzrange;

BEGIN
for _tsrange in select
        tstzrange(start_time, end_time)
    from
        sensorlabel
    where
        label_id = ANY(l_ids)
    and
        sensor_id = s_id
    order by
        start_time  LOOP
    IF _prevrange IS NULL THEN
        _prevrange = _tsrange;
    ELSE
        _testrange = _tsrange * _prevrange;
        IF _testrange = 'empty'::tstzrange THEN
            RETURN NEXT _prevrange;
            _prevrange = _tsrange;
        ELSE
           _prevrange =  _tsrange * _prevrange;
        END IF;
    END IF;


END LOOP;
            RETURN NEXT _prevrange;
END;
$function$

select tsrange_fnc(ARRAY[1, 2, 3], 1);
                      tsrange_fnc                      
-------------------------------------------------------
 ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
 ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")

 select tsrange_fnc(ARRAY[1, 2], 1);
                      tsrange_fnc                      
-------------------------------------------------------
 ["01/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
 ["07/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")

select tsrange_fnc(ARRAY[2, 3], 1);
                      tsrange_fnc                      
-------------------------------------------------------
 ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
 ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
英文:

The only way I could do it is with a function:

CREATE TABLE sensorlabel (
sensor_id INT,
label_id INT,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ
);
insert into sensorlabel values
(1,1,'2021-01-01','2021-10-01'),
(1,2,'2020-12-01','2021-05-01'),
(1,2,'2021-07-01','2021-09-01'),
(1,3,'2021-03-01','2021-06-01'),
(1,3,'2021-08-01','2021-12-01');
CREATE OR REPLACE FUNCTION public.tsrange_fnc(l_ids integer[], s_id integer)
RETURNS SETOF tstzrange
LANGUAGE plpgsql
AS $function$
DECLARE
_tsrange tstzrange;
_prevrange tstzrange := NULL;
_testrange tstzrange;
BEGIN
for _tsrange in select
tstzrange(start_time, end_time)
from
sensorlabel
where
label_id = ANY(l_ids)
and
sensor_id = s_id
order by
start_time  LOOP
IF _prevrange IS NULL THEN
_prevrange = _tsrange;
ELSE
_testrange = _tsrange * _prevrange;
IF _testrange = 'empty'::tstzrange THEN
RETURN NEXT _prevrange;
_prevrange = _tsrange;
ELSE
_prevrange =  _tsrange * _prevrange;
END IF;
END IF;
END LOOP;
RETURN NEXT _prevrange;
END;
$function$
select tsrange_fnc(ARRAY[1, 2, 3], 1);
tsrange_fnc                      
-------------------------------------------------------
["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
select tsrange_fnc(ARRAY[1, 2], 1);
tsrange_fnc                      
-------------------------------------------------------
["01/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["07/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
select tsrange_fnc(ARRAY[2, 3], 1);
tsrange_fnc                      
-------------------------------------------------------
["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
</details>
# 答案2
**得分**: 0
使用递归查询可以找到重叠的传感器标签范围:
```SQL
WITH RECURSIVE
sensor_labels(sensor_id, label_id, start_time, end_time) AS (
VALUES (1, 1, '2021-01-01'::timestamp, '2021-10-01'::timestamp),
(1, 2, '2020-12-01'::timestamp, '2021-05-01'::timestamp),
(1, 2, '2021-07-01'::timestamp, '2021-09-01'::timestamp),
(1, 3, '2021-03-01'::timestamp, '2021-06-01'::timestamp),
(1, 3, '2021-08-01'::timestamp, '2021-12-01'::timestamp)),
labels AS (
SELECT l.label_id, row_number() OVER (ORDER BY l.label_id) AS n
FROM (VALUES (1), (2), (3)) l(label_id)),
cte AS (
SELECT sensor_labels.sensor_id,
greatest(coalesce(time_span.start_time, sensor_labels.start_time), sensor_labels.start_time) AS start_time,
least(coalesce(time_span.end_time, sensor_labels.end_time), sensor_labels.end_time) AS end_time,
labels.n
FROM labels
JOIN sensor_labels
USING (label_id)
JOIN (VALUES (NULL::timestamp, NULL::timestamp)) time_span(start_time, end_time)
ON (time_span.end_time IS NULL OR sensor_labels.start_time < time_span.end_time) AND
(time_span.start_time IS NULL OR sensor_labels IS NULL OR time_span.start_time < sensor_labels.end_time)
WHERE labels.n = 1
UNION ALL
SELECT sensor_labels.sensor_id,
greatest(cte.start_time, sensor_labels.start_time) AS start_time,
least(cte.end_time, sensor_labels.end_time) AS end_time,
labels.n
FROM labels
JOIN sensor_labels
USING (label_id)
JOIN cte
ON labels.n = cte.n + 1 AND
(cte.end_time IS NULL OR sensor_labels.start_time < cte.end_time) AND
(sensor_labels.end_time IS NULL OR cte.start_time < sensor_labels.end_time))
SELECT cte.sensor_id, cte.start_time, cte.end_time
FROM cte
WHERE cte.n = (SELECT max(labels.n) FROM labels)
ORDER BY cte.start_time, cte.sensor_id;

这个查询通过迭代标签来细化重叠的时间段。该查询假设 start_time/end_time 对定义了封闭-开放的时间间隔,sensor_labels 具有非空的 start_time,而 NULL 的 end_time 意味着一直到时间结束。CTE 的初始化部分包括 time_span,可用于有效地限制所选时间范围。对于任一边界的 NULL 值表示相应的边界没有约束。

英文:

Overlapping sensor label ranges can be found using a recursive query:

WITH RECURSIVE
sensor_labels(sensor_id, label_id, start_time, end_time) AS (
  VALUES (1, 1, &#39;2021-01-01&#39;::timestamp, &#39;2021-10-01&#39;::timestamp),
         (1, 2, &#39;2020-12-01&#39;::timestamp, &#39;2021-05-01&#39;::timestamp),
         (1, 2, &#39;2021-07-01&#39;::timestamp, &#39;2021-09-01&#39;::timestamp),
         (1, 3, &#39;2021-03-01&#39;::timestamp, &#39;2021-06-01&#39;::timestamp),
         (1, 3, &#39;2021-08-01&#39;::timestamp, &#39;2021-12-01&#39;::timestamp)),
labels AS (
  SELECT l.label_id, row_number() OVER (ORDER BY l.label_id) AS n
    FROM (VALUES (1), (2), (3)) l(label_id)),
cte AS (
  SELECT sensor_labels.sensor_id,
         greatest(coalesce(time_span.start_time, sensor_labels.start_time), sensor_labels.start_time) AS start_time,
         least(coalesce(time_span.end_time, sensor_labels.end_time), sensor_labels.end_time) AS end_time,
         labels.n
    FROM labels
    JOIN sensor_labels
      USING (label_id)
    JOIN (VALUES (NULL::timestamp, NULL::timestamp)) time_span(start_time, end_time)
      ON (time_span.end_time IS NULL OR sensor_labels.start_time &lt; time_span.end_time) AND
         (time_span.start_time IS NULL OR sensor_labels IS NULL OR time_span.start_time &lt; sensor_labels.end_time)
    WHERE labels.n = 1
  UNION ALL
  SELECT sensor_labels.sensor_id,
         greatest(cte.start_time, sensor_labels.start_time) AS start_time,
         least(cte.end_time, sensor_labels.end_time) AS end_time,
         labels.n
    FROM labels
    JOIN sensor_labels
      USING (label_id)
    JOIN cte
      ON labels.n = cte.n + 1 AND
         (cte.end_time IS NULL OR sensor_labels.start_time &lt; cte.end_time) AND
         (sensor_labels.end_time IS NULL OR cte.start_time &lt; sensor_labels.end_time))
SELECT cte.sensor_id, cte.start_time, cte.end_time
  FROM cte
  WHERE cte.n = (SELECT max(labels.n) FROM labels)
  ORDER BY cte.start_time, cte.sensor_id;

This query works by iterating over the labels to refine the overlapping times. This query assumes that start_time/end_time pairs define closed-open intervals, sensor_labels have non-NULL start_time, and NULL end_time means until the end of time. The initialization portion of the CTE includes time_span which can be used to efficiently limit the selected range of times. A NULL for either indicates that the corresponding bound is unconstrained.

huangapple
  • 本文由 发表于 2023年6月5日 01:59:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76401731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定