英文:
Find SCD-2 timerange overlaps when a sensor belongs to multiple labels simultaneously in SQL (Postgres)
问题
我有一个表示不同传感器与不同标签之间时间关联的PostgreSQL表,使用SCD-2方法。表结构如下:
CREATE TABLE SensorLabel (
sensor_id INT,
label_id INT,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ
);
这个表中的每一行表示传感器(sensor_id)在特定时间段(从start_time到end_time)与一个标签(label_id)关联(包括开始时间,不包括结束时间)。
现在我有一个问题,我需要找到一组标签的所有时间范围重叠部分。也就是说,我有一组标签{label1,label2,...,labelN},我想找到传感器与每个标签同时关联的所有时间范围。
请注意,每个标签关联的时间范围可能不同,因此这些范围的交集可能分解为较小的时间范围。我只想返回传感器与所有给定标签关联的那些时间范围。
我可以解决例如静态的2个标签或3个标签的情况,但在通用SQL中解决可变数量N个标签的问题有困难。
编辑:一些示例输入和输出:
sensor|label|from|to
1|1|2021-01-01|2021-10-01
1|2|2020-12-01|2021-05-01
1|2|2021-07-01|2021-09-01
1|3|2021-03-01|2021-06-01
1|3|2021-08-01|2021-12-01
输出:即标签1,2,3的(时间范围的)重叠时间范围:
sensor|from|to
1|2021-03-01|2021-05-01
1|2021-08-01|2021-09-01
英文:
I have a PostgreSQL table that represents the time-bound associations of different sensors with different labels, using a SCD-2 approach. The table structure is as follows:
CREATE TABLE SensorLabel (
sensor_id INT,
label_id INT,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ
);
Each row in this table represents that a sensor (sensor_id) is associated with a label (label_id) during a specific time period [start_time to end_time) (so, resp. inclusive, exclusive).
Now I have a problem where I need to find all time range overlaps for a set of labels. That is, I have a set of labels {label1, label2, ..., labelN} and I want to find all time ranges when the sensor was associated with each of these labels at the same time.
Please note that the time range for each label association could be different, so the intersection of these ranges may break up into smaller time-ranges. I want to return only those time-ranges during which the sensor is associated with all the given labels.
I can solve for say a static 2 labels or 3 labels but have problems wrapping this in a generic SQL that solves for a variable N labels
EDIT: some sample input and output
sensor|label|from|to
1|1|2021-01-01|2021-10-01
1|2|2020-12-01|2021-05-01
1|2|2021-07-01|2021-09-01
1|3|2021-03-01|2021-06-01
1|3|2021-08-01|2021-12-01
Ouput: i.e.: time-ranges where (time-ranges for) label 1,2,3, overlap:
sensor|from|to
1|2021-03-01|2021-05-01
1|2021-08-01|2021-09-01
答案1
得分: 1
以下是您要翻译的内容:
"我唯一能够使用函数完成的方式是:"
CREATE TABLE sensorlabel (
sensor_id INT,
label_id INT,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ
);
insert into sensorlabel values
(1,1,'2021-01-01','2021-10-01'),
(1,2,'2020-12-01','2021-05-01'),
(1,2,'2021-07-01','2021-09-01'),
(1,3,'2021-03-01','2021-06-01'),
(1,3,'2021-08-01','2021-12-01');
CREATE OR REPLACE FUNCTION public.tsrange_fnc(l_ids integer[], s_id integer)
RETURNS SETOF tstzrange
LANGUAGE plpgsql
AS $function$
DECLARE
_tsrange tstzrange;
_prevrange tstzrange := NULL;
_testrange tstzrange;
BEGIN
for _tsrange in select
tstzrange(start_time, end_time)
from
sensorlabel
where
label_id = ANY(l_ids)
and
sensor_id = s_id
order by
start_time LOOP
IF _prevrange IS NULL THEN
_prevrange = _tsrange;
ELSE
_testrange = _tsrange * _prevrange;
IF _testrange = 'empty'::tstzrange THEN
RETURN NEXT _prevrange;
_prevrange = _tsrange;
ELSE
_prevrange = _tsrange * _prevrange;
END IF;
END IF;
END LOOP;
RETURN NEXT _prevrange;
END;
$function$
select tsrange_fnc(ARRAY[1, 2, 3], 1);
tsrange_fnc
-------------------------------------------------------
["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
select tsrange_fnc(ARRAY[1, 2], 1);
tsrange_fnc
-------------------------------------------------------
["01/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["07/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
select tsrange_fnc(ARRAY[2, 3], 1);
tsrange_fnc
-------------------------------------------------------
["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
英文:
The only way I could do it is with a function:
CREATE TABLE sensorlabel (
sensor_id INT,
label_id INT,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ
);
insert into sensorlabel values
(1,1,'2021-01-01','2021-10-01'),
(1,2,'2020-12-01','2021-05-01'),
(1,2,'2021-07-01','2021-09-01'),
(1,3,'2021-03-01','2021-06-01'),
(1,3,'2021-08-01','2021-12-01');
CREATE OR REPLACE FUNCTION public.tsrange_fnc(l_ids integer[], s_id integer)
RETURNS SETOF tstzrange
LANGUAGE plpgsql
AS $function$
DECLARE
_tsrange tstzrange;
_prevrange tstzrange := NULL;
_testrange tstzrange;
BEGIN
for _tsrange in select
tstzrange(start_time, end_time)
from
sensorlabel
where
label_id = ANY(l_ids)
and
sensor_id = s_id
order by
start_time LOOP
IF _prevrange IS NULL THEN
_prevrange = _tsrange;
ELSE
_testrange = _tsrange * _prevrange;
IF _testrange = 'empty'::tstzrange THEN
RETURN NEXT _prevrange;
_prevrange = _tsrange;
ELSE
_prevrange = _tsrange * _prevrange;
END IF;
END IF;
END LOOP;
RETURN NEXT _prevrange;
END;
$function$
select tsrange_fnc(ARRAY[1, 2, 3], 1);
tsrange_fnc
-------------------------------------------------------
["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
select tsrange_fnc(ARRAY[1, 2], 1);
tsrange_fnc
-------------------------------------------------------
["01/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["07/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
select tsrange_fnc(ARRAY[2, 3], 1);
tsrange_fnc
-------------------------------------------------------
["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
</details>
# 答案2
**得分**: 0
使用递归查询可以找到重叠的传感器标签范围:
```SQL
WITH RECURSIVE
sensor_labels(sensor_id, label_id, start_time, end_time) AS (
VALUES (1, 1, '2021-01-01'::timestamp, '2021-10-01'::timestamp),
(1, 2, '2020-12-01'::timestamp, '2021-05-01'::timestamp),
(1, 2, '2021-07-01'::timestamp, '2021-09-01'::timestamp),
(1, 3, '2021-03-01'::timestamp, '2021-06-01'::timestamp),
(1, 3, '2021-08-01'::timestamp, '2021-12-01'::timestamp)),
labels AS (
SELECT l.label_id, row_number() OVER (ORDER BY l.label_id) AS n
FROM (VALUES (1), (2), (3)) l(label_id)),
cte AS (
SELECT sensor_labels.sensor_id,
greatest(coalesce(time_span.start_time, sensor_labels.start_time), sensor_labels.start_time) AS start_time,
least(coalesce(time_span.end_time, sensor_labels.end_time), sensor_labels.end_time) AS end_time,
labels.n
FROM labels
JOIN sensor_labels
USING (label_id)
JOIN (VALUES (NULL::timestamp, NULL::timestamp)) time_span(start_time, end_time)
ON (time_span.end_time IS NULL OR sensor_labels.start_time < time_span.end_time) AND
(time_span.start_time IS NULL OR sensor_labels IS NULL OR time_span.start_time < sensor_labels.end_time)
WHERE labels.n = 1
UNION ALL
SELECT sensor_labels.sensor_id,
greatest(cte.start_time, sensor_labels.start_time) AS start_time,
least(cte.end_time, sensor_labels.end_time) AS end_time,
labels.n
FROM labels
JOIN sensor_labels
USING (label_id)
JOIN cte
ON labels.n = cte.n + 1 AND
(cte.end_time IS NULL OR sensor_labels.start_time < cte.end_time) AND
(sensor_labels.end_time IS NULL OR cte.start_time < sensor_labels.end_time))
SELECT cte.sensor_id, cte.start_time, cte.end_time
FROM cte
WHERE cte.n = (SELECT max(labels.n) FROM labels)
ORDER BY cte.start_time, cte.sensor_id;
这个查询通过迭代标签来细化重叠的时间段。该查询假设 start_time
/end_time
对定义了封闭-开放的时间间隔,sensor_labels
具有非空的 start_time
,而 NULL 的 end_time
意味着一直到时间结束。CTE 的初始化部分包括 time_span
,可用于有效地限制所选时间范围。对于任一边界的 NULL
值表示相应的边界没有约束。
英文:
Overlapping sensor label ranges can be found using a recursive query:
WITH RECURSIVE
sensor_labels(sensor_id, label_id, start_time, end_time) AS (
VALUES (1, 1, '2021-01-01'::timestamp, '2021-10-01'::timestamp),
(1, 2, '2020-12-01'::timestamp, '2021-05-01'::timestamp),
(1, 2, '2021-07-01'::timestamp, '2021-09-01'::timestamp),
(1, 3, '2021-03-01'::timestamp, '2021-06-01'::timestamp),
(1, 3, '2021-08-01'::timestamp, '2021-12-01'::timestamp)),
labels AS (
SELECT l.label_id, row_number() OVER (ORDER BY l.label_id) AS n
FROM (VALUES (1), (2), (3)) l(label_id)),
cte AS (
SELECT sensor_labels.sensor_id,
greatest(coalesce(time_span.start_time, sensor_labels.start_time), sensor_labels.start_time) AS start_time,
least(coalesce(time_span.end_time, sensor_labels.end_time), sensor_labels.end_time) AS end_time,
labels.n
FROM labels
JOIN sensor_labels
USING (label_id)
JOIN (VALUES (NULL::timestamp, NULL::timestamp)) time_span(start_time, end_time)
ON (time_span.end_time IS NULL OR sensor_labels.start_time < time_span.end_time) AND
(time_span.start_time IS NULL OR sensor_labels IS NULL OR time_span.start_time < sensor_labels.end_time)
WHERE labels.n = 1
UNION ALL
SELECT sensor_labels.sensor_id,
greatest(cte.start_time, sensor_labels.start_time) AS start_time,
least(cte.end_time, sensor_labels.end_time) AS end_time,
labels.n
FROM labels
JOIN sensor_labels
USING (label_id)
JOIN cte
ON labels.n = cte.n + 1 AND
(cte.end_time IS NULL OR sensor_labels.start_time < cte.end_time) AND
(sensor_labels.end_time IS NULL OR cte.start_time < sensor_labels.end_time))
SELECT cte.sensor_id, cte.start_time, cte.end_time
FROM cte
WHERE cte.n = (SELECT max(labels.n) FROM labels)
ORDER BY cte.start_time, cte.sensor_id;
This query works by iterating over the labels to refine the overlapping times. This query assumes that start_time
/end_time
pairs define closed-open intervals, sensor_labels
have non-NULL start_time
, and NULL end_time
means until the end of time. The initialization portion of the CTE includes time_span
which can be used to efficiently limit the selected range of times. A NULL
for either indicates that the corresponding bound is unconstrained.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论