从带有标签的区间列表中查找基本区间。

huangapple go评论72阅读模式
英文:

Finding elementary intervals from a list of intervals with label(s)

问题

I am looking for an algorithm that can efficiently separate overlapped intervals and sort them.

寻找一个能有效地分离重叠的区间并对它们进行排序的算法。

Above question closely resembles mine, but my case has one more condition: each interval has (a) label(s).

上面的问题与我的情况非常相似,但我的情况多了一个条件:每个区间都有一个或多个标签。

In python, I represent each interval with the following class:

在Python中,我使用以下类来表示每个区间:

from typing import Set


class Interval:
   def __init__(self, a:float, b:float, labels:Set[str]):
      self.a = a
      self.b = b
      self.labels = labels

Here, a represents the left end of the interval, b is the right end, and labels is a set of strings which will be united with another when an Interval has intersection with another.

在这里,a 表示区间的左端,b 表示右端,而 labels 是一组字符串,当一个 Interval 与另一个区间相交时,它们将会合并。

For instance, given intervals like followings:

例如,给定以下的区间:

intervalA = Interval(0, 3, {"foo"})
intervalB = Interval(2, 4, {"bar"})
intervals = [intervalA, intervalB]

My desired output will look like below:

我的期望输出将如下所示:

desiredOutput = [Interval(0, 2, {"foo"}), Interval(2, 3, {"foo", "bar"}), Interval(3, 4, {"bar"})]
actualOutput = separateIntervals(intervals)
# desiredOutput == actualOutput should be evaluated as `True`
英文:

I am looking for an algorithm that can efficiently separate overlapped intervals and sort them.

https://stackoverflow.com/questions/8501423/finding-elementary-intervals-in-overlapping-intervals

Above question closely resembles mine, but my case has one more condition: each interval has (a) label(s).

In python, I represent each interval with the following class:

from typing import Set


class Interval:
   def __init__(self, a:float, b:float, labels:Set[str]):
      self.a = a
      self.b = b
      self.labels = labels

Here, a represents the left end of the interval, b is the right end, andlabels is a set of strings which will be united with another when an Interval has intersection with another.

For instance, given intervals like followings:

intervalA = Interval(0, 3, {"foo"})
intervalB = Interval(2, 4, {"bar"})
intervals = [intervalA, intervalB]

My desired output will look like below:

desiredOutput = [Interval(0, 2, "foo"), Interval(2, 3, {"foo", "bar"}), Interval(3, 4, {"bar"})]
actualOutput = separateIntervals(intervals)
# desiredOutput == actualOutput should be evaluated as `True`

答案1

得分: 2

以下是您提供的Python代码的翻译:

我在Python方面不太擅长所以创建了一个没有类的示例

我们将所有的区间端点分开每个端点与字段1/0连接用于开始/结束和标签字符串

按照端点位置进行排序1/0作为次要键因此在相同位置的情况下区间结束会在另一个区间开始之前)。

遍历排序后的列表创建具有当前标签集的新区间然后更新标签集


intervals = [(1,6,'a'), (2,4,'b'), (3,5,'c'), (4,8,'d')]

ends = []
for x in intervals:
    ends.append((x[0], 1, x[2]))
    ends.append((x[1], 0, x[2]))

ends.sort()
labelset = set()
labelset.add(ends[0][2])
start = ends[0][0]
outp = []
for e in ends:
    if e[0] > start:
        outp.append((start, e[0], list(labelset)))
    if e[1]:
        labelset.add(e[2])
    else:
        labelset.remove(e[2])
    start = e[0]

print(outp)

# 输出结果:
# [(1, 2, ['a']), (2, 3, ['a', 'b']), (3, 4, ['a', 'c', 'b']), 
# (4, 5, ['d', 'a', 'c']), (5, 6, ['d', 'a']), (6, 8, ['d'])]
英文:

I am not good in Python, so made example without classes.

We separate all interval endpoints, every is joined with field 1/0 for start/end and label string.

Sort them by enpoint position (1/0 works as secondary key, so interval end goes before another interval start in case of the same position).

Walk through sorted list, creating new intervals with current label set, then updating label set

intervals = [(1,6,'a'), (2,4,'b'), (3,5,'c'), (4,8,'d')]

ends = []
for x in intervals:
    ends.append((x[0], 1, x[2]))
    ends.append((x[1], 0, x[2]))

ends.sort()
labelset = set()
labelset.add(ends[0][2])
start = ends[0][0]
outp = []
for e in ends:
    if e[0] > start:
        outp.append((start, e[0], list(labelset)))
    if e[1]:
        labelset.add(e[2])
    else:
        labelset.remove(e[2])
    start = e[0]

print(outp)

>>>[(1, 2, ['a']), (2, 3, ['a', 'b']), (3, 4, ['a', 'c', 'b']), 
        (4, 5, ['d', 'a', 'c']), (5, 6, ['d', 'a']), (6, 8, ['d'])]

答案2

得分: 1

以下是翻译好的内容:

算法

以下代码描述了一个基于堆栈的算法来处理区间。这不是递归或分层的,这样我更容易理解。区间以成对方式处理,每对产生一个要添加到输出的区间,以及一些要推回堆栈的区间。

对不起,我的算法描述是使用 c++ 编写的(我假设它对大多数独立于语言的开发人员来说至少是可读的)。至少它具有精确的语义和易于验证的副作用。尽管如此,我只对单个示例进行了测试,所以您需要仔细检查它。

示例代码(已更新)

#include <iostream>
#include <stack>
#include <vector>

struct Interval {
    Interval(int s, int e, const std::string& l)
        : start(s)
        , end(e)
        , label(l) {
    }
    int start, end;
    std::string label;
};

using Intervals = std::vector<Interval>;
using Stack = std::stack<Interval>;

using std::cout, std::endl;

int main(int argc, const char *argv[]) {
    Intervals vals = {
        { 0, 3, "foo" },
        { 2, 4, "bar" }
    };

    // 按照它们的起始值和结束值对区间进行排序。
    std::sort(vals.begin(), vals.end(), [](const auto& a, const auto& b) {
        return (a.start < b.start) or (a.start == b.start and a.end < b.end);
        return false;
    });

    // 将区间放在堆栈上,使第一个位于顶部。
    Stack stack(vals.rbegin(), vals.rend());


    Intervals results;
    // 只要堆栈上至少有两个区间,就对第一对区间进行操作。
    while (stack.size() > 1) {
        auto a = stack.top(); stack.pop();
        auto b = stack.top(); stack.pop();

        // 如果没有重叠,输出第一个区间并将第二个区间返回到堆栈。
        if (a.end <= b.start) {
            results.emplace_back(a);
            stack.push(b);
        }
        // 根据来自OP的失败测试用例添加。
        // 如果a和b从同一点开始,我们必须调整所有以该点开头的区间。
        else if (a.start == b.start) {
            auto end = std::min(a.end, b.end);
            std::string label = a.label + "-" + b.label;
            Stack tmp;
            while (stack.size() and stack.top().start == a.start) {
                auto next = stack.top();
                label += "-" + next.label;
                stack.pop();
                tmp.emplace(end, next.end, next.label);
            }
            while (tmp.size()) {
                stack.emplace(tmp.top());
                tmp.pop();
            }
            stack.emplace(end, b.end, b.label);
            results.emplace_back(a.start, end, label);
        }
        // 如果a和b部分重叠,我们有三个区间
        // (a.start, b.start), (b.start, a.end) 和 (a.end,
        // b.end)。输出第一个并将接下来的两个推回堆栈。
        else if (a.end > b.start and b.end > a.end) {
            results.emplace_back(a.start, b.start, a.label);
            stack.emplace(a.end, b.end, b.label);
            stack.emplace(b.start, a.end, a.label + "-" + b.label);
        }
        // 如果b是a的后缀,我们有两个区间
        // (a.start, b.start) 和 (b.start, a.end)。输出第一个并将第二个推到堆栈上。
        else if (a.end > b.start and a.end == b.end) {
            results.emplace_back(a.start, b.start, a.label);
            stack.emplace(b.start, a.end, a.label + "-" + b.label);
        }
        // 如果b完全被a覆盖,我们有三个区间
        // (a.start, b.start), (b.start, b.end), 和 (b.end, a.end)。
        else if (a.end > b.start and a.end > b.end) {
            results.emplace_back(a.start, b.start, a.label);
            stack.emplace(b.end, a.end, a.label);
            stack.emplace(b.start, b.end, a.label + "-" + b.label);
        }
    }

    if (stack.size() > 0)
        results.emplace_back(stack.top());

    for (const auto& [a, b, label] : results)
        cout << a << " " << b << " " << label << endl;

    return 0;
}

输入

    Intervals vals = {
        { 0, 3, "foo" },
        { 0, 4, "bar" },
        { 0, 5, "que" }
    };

输出

0 3 foo-bar-que
3 4 bar-que
4 5 que
英文:

Algorithm

The following code describes a stack-based algorithm to process the intervals. This is not recursive or hierarchical making it easier for me to follow mentally. The intervals are processed in pairs and each pair produces an interval to add to the output as well as some intervals to push back onto the stack.

I apologize that my algorithmic description is in c++ (I am presuming it is at least readable for most developers independent of language). It at least has the side-effect of precise semantics and easy validation. That said, I only tested it with the single example, so you will want to check it closely.

Sample Code (Updated)

#include <iostream>
#include <stack>
#include <vector>

struct Interval {
    Interval(int s, int e, const std::string& l)
        : start(s)
        , end(e)
        , label(l) {
    }
    int start, end;
    std::string label;
};

using Intervals = std::vector<Interval>;
using Stack = std::stack<Interval>;

using std::cout, std::endl;

int main(int argc, const char *argv[]) {
    Intervals vals = {
        { 0, 3, "foo" },
        { 2, 4, "bar" }
    };

    // Sort the intervals by their start value and end value.
    std::sort(vals.begin(), vals.end(), [](const auto& a, const auto& b) {
        return (a.start < b.start) or (a.start == b.start and a.end < b.end);
        return false;
    });

    // Place the intervals on a stack such that the first is on top.
    Stack stack(vals.rbegin(), vals.rend());


    Intervals results;
    // While there are at least two intervals on the stack, operate on
    // the first two intervals.
    while (stack.size() > 1) {
        auto a = stack.top(); stack.pop();
        auto b = stack.top(); stack.pop();

        // If there is no overalap, output the first interval and
        // return the second interval to the stack.
        if (a.end <= b.start) {
            results.emplace_back(a);
            stack.push(b);
        }
        // Added based on failed test-case from OP.
        // If and b start at the same point, we have to adjust all
        // intervals that start at that point.
        else if (a.start == b.start) {
            auto end = std::min(a.end, b.end);
            std::string label = a.label + "-" + b.label;
            Stack tmp;
            while (stack.size() and stack.top().start == a.start) {
                auto next = stack.top();
                label += "-" + next.label;
                stack.pop();
                tmp.emplace(end, next.end, next.label);
            }
            while (tmp.size()) {
                stack.emplace(tmp.top());
                tmp.pop();
            }
            stack.emplace(end, b.end, b.label);
            results.emplace_back(a.start, end, label);
        }
        // If a and b partially overlap, we have three intervals
        // (a.start, b.start), (b.start, a.end) and (a.end,
        // b.end). Output the first and push the next two back to the
        // stack.
        else if (a.end > b.start and b.end > a.end) {
            results.emplace_back(a.start, b.start, a.label);
            stack.emplace(a.end, b.end, b.label);
            stack.emplace(b.start, a.end, a.label + "-" + b.label);
        }
        // If b is a suffix of a, we have two intervals (a.start,
        // b.start) and (b.start, a.end). Output the first and push
        // the second on the stack.
        else if (a.end > b.start and a.end == b.end) {
            results.emplace_back(a.start, b.start, a.label);
            stack.emplace(b.start, a.end, a.label + "-" + b.label);
        }
        // If b is complete covered by a, we have three intervals
        // (a.start, b.start), (b.start, b.end), and (b.end, a.end).
        else if (a.end > b.start and a.end > b.end) {
            results.emplace_back(a.start, b.start, a.label);
            stack.emplace(b.end, a.end, a.label);
            stack.emplace(b.start, b.end, a.label + "-" + b.label);
        }
    }

    if (stack.size() > 0)
        results.emplace_back(stack.top());

    for (const auto& [a, b, label] : results)
        cout << a << " " << b << " " << label << endl;

    return 0;
}

Input

    Intervals vals = {
{ 0, 3, "foo" },
{ 0, 4, "bar" },
{ 0, 5, "que" }
};

Output

0 3 foo-bar-que
3 4 bar-que
4 5 que

答案3

得分: 0

I'm the maintainer of a Python library called portion that aims to ease manipulating intervals. As part of this library, there's an IntervalDict class that allows storing data along with intervals. It acts merely as a classical dict structure in which keys can be Interval instances.

Using this structure, you can easily solve your problem in a few steps.

First, let's encode your data:

>>> import portion as P
>>> A = P.IntervalDict({P.closed(0, 3): 'foo'})
>>> B = P.IntervalDict({P.closed(2, 4): 'bar'})

The IntervalDict class proposes a combine method allowing to "merge" two IntervalDicts together while applying a function on the shared keys (i.e., on intervals that overlap):

>>> output = A.combine(B, how=list)
>>> output.as_dict()
{
  [0,2): 'foo', 
  [2,3]: ['foo', 'bar'],
  (3,4]: 'bar'
}

Hope it helps.

See https://github.com/AlexandreDecan/portion for the documentation.

英文:

I'm the maintainer of a Python library called portion that aims to ease manipulating intervals. As part of this library, there's an IntervalDict class that allows to store data along with intervals. It acts merely as a classical dict structure in which keys can be Interval instances.

Using this structure, you can easily solve your problem in a few steps.

First, let's encode your data:

>>> import portion as P
>>> A = P.IntervalDict({P.closed(0, 3): 'foo'})
>>> B = P.IntervalDict({P.closed(2, 4): 'bar'})

The IntervalDict class proposes a combine method allowing to "merge" two IntervalDicts together while applying a function on the shared keys (i.e., on intervals that overlap):

>>> output = A.combine(B, how=list)
>>> output.as_dict()
{
  [0,2): 'foo', 
  [2,3]: ['foo', 'bar'],
  (3,4]: 'bar'
}

Hope it helps.

See https://github.com/AlexandreDecan/portion for the documentation.

huangapple
  • 本文由 发表于 2023年5月11日 08:33:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223401.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定