英文:
Intersecting ranges in Oracle SQL and include also not intersecting ranges
问题
Here's the translation of the provided text without the code part:
我无法解决以下问题:
我有两个包含范围和值的数据集,例如:
数据集1
起始 | 结束 | 值 |
---|---|---|
0 | 20 | A |
20 | 50 | B |
50 | 150 | C |
150 | 180 | X |
数据集2
起始 | 结束 | 值 |
---|---|---|
10 | 30 | D |
70 | 100 | E |
我想要在SQL中找到这些范围的交集,并且还包括来自数据集1的所有范围。
所以结果应该是:
起始 | 结束 | 值 |
---|---|---|
0 | 10 | A |
10 | 20 | AD |
20 | 30 | BD |
30 | 50 | B |
50 | 70 | C |
70 | 100 | CE |
100 | 150 | C |
150 | 180 | X |
使用SQL很容易找到仅相交部分(10-20、20-30、70-100)和完全不相交的部分(150-180),但我正在解决部分相交的部分(0-10、30-50、50-70、100-150)。
我用以下SQL找到了相交的部分:
SELECT GREATEST(t1.range_start, t2.range_start) AS intersect_start,
LEAST(t1.range_end, t2.range_end) AS intersect_end
FROM ranges t1
JOIN ranges t2 ON t1.range_start < t2.range_end AND t1.range_end > t2.range_start
有人能指导我找到解决方案吗?最好是纯SQL,但也可以是PL/SQL。
英文:
I cannot figure out a solution the following problem:
I have two datasets containing ranges and a value, e.g.
Dataset1
From | To | Value |
---|---|---|
0 | 20 | A |
20 | 50 | B |
50 | 150 | C |
150 | 180 | X |
Dataset2
From | To | Value |
---|---|---|
10 | 30 | D |
70 | 100 | E |
I want to intersect these ranges but also include all ranges from Dataset1 using SQL.
So the result should be:
From | To | Value |
---|---|---|
0 | 10 | A |
10 | 20 | AD |
20 | 30 | BD |
30 | 50 | B |
50 | 70 | C |
70 | 100 | CE |
100 | 150 | C |
150 | 180 | X |
With SQL it is easy to find only the intersecting parts (10-20, 20-30, 70-100) and the completely not intersecting parts (150-180) - I am struggling with the partially intersecting parts (0-10, 30-50, 50-70, 100-150).
I used this for the intersecting parts:
SELECT GREATEST(t1.range_start, t2.range_start) AS intersect_start,
LEAST(t1.range_end, t2.range_end) AS intersect_end
FROM ranges t1
JOIN ranges t2 ON t1.range_start < t2.range_end AND t1.range_end > t2.range_start
Could someone point me to the solution? Would be nice in plain SQL, but could also be in PL/SQL
答案1
得分: 2
以下是翻译好的代码部分:
with un as (
select * from ds1 union all select * from ds2),
up as (
select distinct rng from un unpivot (rng for col in (r1, r2))),
ld as (
select rng r1, lead(rng) over (order by rng) r2 from up)
select ld.r1, ld.r2, listagg(value) within group (order by value) list
from ld join un on un.r1 < ld.r2 and ld.r1 < un.r2
group by ld.r1, ld.r2
如果需要进一步解释或有其他问题,请告诉我。
英文:
with un as (
select * from ds1 union all select * from ds2),
up as (
select distinct rng from un unpivot (rng for col in (r1, r2))),
ld as (
select rng r1, lead(rng) over (order by rng) r2 from up)
select ld.r1, ld.r2, listagg(value) within group (order by value) list
from ld join un on un.r1 < ld.r2 and ld.r1 < un.r2
group by ld.r1, ld.r2
Description:
un
- union of two tablesup
- unpivotedun
, just one column of all distinct rangesld
- above with next value of ranges
At the end ld
is joined with un
and ld
, listagg() groups values
答案2
得分: 1
以下是您要翻译的内容:
您可以对数据集进行反规整并组合它们,然后使用LEAD
分析函数查找范围中的下一个边界,然后使用递归查询来遍历这些边界并进行聚合,根据它们的范围的起始或结束分别添加或删除值:
WITH data (value, start_end, bound, dataset) AS (
SELECT value, start_end, bound, 1
FROM dataset1
UNPIVOT (bound FOR start_end IN ("FROM" AS 1, "TO" AS -1)) d
UNION ALL
SELECT value, start_end, bound, 2
FROM dataset2
UNPIVOT (bound FOR start_end IN ("FROM" AS 1, "TO" AS -1)) d
),
bounds (value, start_end, bound, next_bound, rn) AS (
SELECT value,
start_end,
bound,
LEAD(bound) OVER (ORDER BY bound, start_end, dataset),
ROW_NUMBER() OVER (ORDER BY bound, start_end, dataset)
FROM data
),
groups (value, bound, next_bound, rn) AS (
SELECT value, bound, next_bound, rn
FROM bounds
WHERE rn = 1
UNION ALL
SELECT CASE b.start_end
WHEN 1
THEN g.value || b.value
ELSE REPLACE(g.value, b.value)
END,
b.bound,
b.next_bound,
b.rn
FROM bounds b
INNER JOIN groups g
ON (g.rn + 1 = b.rn)
)
SELECT value,
bound AS "FROM",
next_bound AS "TO"
FROM groups
WHERE bound < next_bound
ORDER BY rn;
对于示例数据:
CREATE TABLE Dataset1 ("FROM", "TO", Value) AS
SELECT 0, 20, 'A' FROM DUAL UNION ALL
SELECT 20, 50, 'B' FROM DUAL UNION ALL
SELECT 50, 150, 'C' FROM DUAL UNION ALL
SELECT 150, 180, 'X' FROM DUAL;
CREATE TABLE Dataset2 ("FROM", "TO", Value) AS
SELECT 10, 30, 'D' FROM DUAL UNION ALL
SELECT 70, 100, 'E' FROM DUAL;
输出:
VALUE | FROM | TO |
---|---|---|
A | 0 | 10 |
AD | 10 | 20 |
DB | 20 | 30 |
B | 30 | 50 |
C | 50 | 70 |
CE | 70 | 100 |
C | 100 | 150 |
X | 150 | 180 |
英文:
You can unpivot the datasets and combine them and then use the LEAD
analytic function to find the next bound in the range and then use a recursive query to iterate over the bounds and aggregate, adding or removing values, as it comes to the start or end, respectively, of their ranges:
WITH data (value, start_end, bound, dataset) AS (
SELECT value, start_end, bound, 1
FROM dataset1
UNPIVOT (bound FOR start_end IN ("FROM" AS 1, "TO" AS -1)) d
UNION ALL
SELECT value, start_end, bound, 2
FROM dataset2
UNPIVOT (bound FOR start_end IN ("FROM" AS 1, "TO" AS -1)) d
),
bounds (value, start_end, bound, next_bound, rn) AS (
SELECT value,
start_end,
bound,
LEAD(bound) OVER (ORDER BY bound, start_end, dataset),
ROW_NUMBER() OVER (ORDER BY bound, start_end, dataset)
FROM data
),
groups (value, bound, next_bound, rn) AS (
SELECT value, bound, next_bound, rn
FROM bounds
WHERE rn = 1
UNION ALL
SELECT CASE b.start_end
WHEN 1
THEN g.value || b.value
ELSE REPLACE(g.value, b.value)
END,
b.bound,
b.next_bound,
b.rn
FROM bounds b
INNER JOIN groups g
ON (g.rn + 1 = b.rn)
)
SELECT value,
bound AS "FROM",
next_bound AS "TO"
FROM groups
WHERE bound < next_bound
ORDER BY rn;
Which, for the sample data:
CREATE TABLE Dataset1 ("FROM", "TO", Value) AS
SELECT 0, 20, 'A' FROM DUAL UNION ALL
SELECT 20, 50, 'B' FROM DUAL UNION ALL
SELECT 50, 150, 'C' FROM DUAL UNION ALL
SELECT 150, 180, 'X' FROM DUAL;
CREATE TABLE Dataset2 ("FROM", "TO", Value) AS
SELECT 10, 30, 'D' FROM DUAL UNION ALL
SELECT 70, 100, 'E' FROM DUAL;
Outputs:
VALUE | FROM | TO |
---|---|---|
A | 0 | 10 |
AD | 10 | 20 |
DB | 20 | 30 |
B | 30 | 50 |
C | 50 | 70 |
CE | 70 | 100 |
C | 100 | 150 |
X | 150 | 180 |
答案3
得分: 0
以下是您要翻译的内容:
您可以尝试此查询
WITH intersecting AS (
SELECT GREATEST(d1.StartRange, d2.StartRange) AS StartRange,
LEAST(d1.EndRange, d2.EndRange) AS EndRange,
CONCAT(d1.Value, d2.Value) AS Value,
d1.StartRange AS StartRange1,
d1.EndRange AS EndRange1
FROM Dataset1 d1
INNER JOIN Dataset2 d2
ON d1.StartRange <= d2.EndRange AND d1.EndRange >= d2.StartRange
),
partially_intersecting AS (
-- 获取左侧剩余范围,过滤掉起始等于结束的空范围
SELECT d1.StartRange AS StartRange,
COALESCE(i1.StartRange, d1.EndRange) AS EndRange,
d1.Value
FROM Dataset1 d1
LEFT JOIN intersecting i1 ON d1.StartRange = i1.StartRange1
WHERE d1.StartRange <> COALESCE(i1.StartRange, d1.EndRange)
UNION
-- 获取右侧剩余范围,过滤掉起始等于结束的空范围
SELECT COALESCE(i2.EndRange, d1.StartRange) AS StartRange,
d1.EndRange AS EndRange,
d1.Value
FROM Dataset1 d1
LEFT JOIN intersecting i2 ON d1.EndRange = i2.EndRange1
WHERE d1.EndRange <> COALESCE(i2.EndRange, d1.StartRange)
),
output_data AS (
SELECT StartRange, EndRange, Value FROM intersecting
UNION ALL
SELECT StartRange, EndRange, Value FROM partially_intersecting
)
SELECT * FROM output_data ORDER BY StartRange
查看演示[此处][1]
[1]: https://dbfiddle.uk/KNAlwsdg
注意:我已经根据您的要求将代码部分保留为原文。
英文:
You could try this query
WITH intersecting AS (
SELECT GREATEST(d1.StartRange, d2.StartRange) AS StartRange,
LEAST(d1.EndRange, d2.EndRange) AS EndRange,
CONCAT(d1.Value, d2.Value) AS Value,
d1.StartRange AS StartRange1,
d1.EndRange AS EndRange1
FROM Dataset1 d1
INNER JOIN Dataset2 d2
ON d1.StartRange <= d2.EndRange AND d1.EndRange >= d2.StartRange
),
partially_intersecting AS (
-- get left side remain range, filter out empty range which having start = end
SELECT d1.StartRange AS StartRange,
COALESCE(i1.StartRange, d1.EndRange) AS EndRange,
d1.Value
FROM Dataset1 d1
LEFT JOIN intersecting i1 ON d1.StartRange = i1.StartRange1
WHERE d1.StartRange <> COALESCE(i1.StartRange, d1.EndRange)
UNION
-- get right side remain range, filter out empty range which having start = end
SELECT COALESCE(i2.EndRange, d1.StartRange) AS StartRange,
d1.EndRange AS EndRange,
d1.Value
FROM Dataset1 d1
LEFT JOIN intersecting i2 ON d1.EndRange = i2.EndRange1
WHERE d1.EndRange <> COALESCE(i2.EndRange, d1.StartRange)
),
output_data AS (
SELECT StartRange, EndRange, Value FROM intersecting
UNION ALL
SELECT StartRange, EndRange, Value FROM partially_intersecting
)
SELECT * FROM output_data ORDER BY StartRange
See demo here
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论