英文:
Find overlapping ranges from a large set of data efficiently
问题
有没有一种高效的算法可以检查多个有界连续范围之间的重叠?
如果我们有一对任务,每个任务由开始时间和结束时间表示,可以使用下面的函数来检测冲突。
# 任务的形式为 (开始时间, 结束时间)
def has_clash(task_1: tuple[float, float], task_2: tuple[float, float]) -> bool:
"""检查两个任务之间是否存在冲突。"""
return (task_2[1] > task_1[0]) and (task_2[0] < task_1[1])
要检查多个任务,我们需要对任务集合中的每对任务运行上述测试,总体复杂度为 O(n^2),例如:
def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
"""检查是否有任何任务冲突,如果没有冲突则返回 True,否则返回 False。"""
if len(tasks) < 2:
return False
clash = any(
has_clash(task, other_task)
for i, (task) in enumerate(tasks[1:])
for other_task in tasks[:i]
)
return clash
是否可以通过排序或其他数学技巧改进这种复杂度?
是否有一种更高效的算法可以实现这个功能?
英文:
Is there an efficient algorithm for checking for overlaps between multiple bounded continuous ranges?
If we have a pair of tasks, each represented by a start time and an end time, a function such as the one below can detect clashes.
# tasks are of the form (start_time, end_time)
def has_clash(task_1: tuple[float, float], task_2: tuple[float, float]) -> bool:
"""Check if a clash exists between two tasks."""
return (task_2[1] > task_1[0]) and (task_2[0] < task_1[1])
To check multiple tasks, we would have to run the above test pairwise on the set of tasks, for an overall complexity of O(n^2), e.g.:
def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
"""Check if any tasks clash, return True if no clashes, false otherwise."""
if len(tasks) < 2:
return False
clash = any(
has_clash(task, other_task)
for i, (task) in enumerate(tasks[1:])
for other_task in tasks[:i]
)
return clash
Can this complexity be improved upon by e.g. sorting or some other mathematical magic?
Is there an algorithm that does this more efficiently?
答案1
得分: 2
按照开始时间对列表进行排序,然后只需比较连续条目的最后结束时间与下一个开始时间。这个算法的复杂度为O(nlogn)用于排序和O(n)用于迭代配对。
def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
"""Check if any tasks clash, return True if no clashes, false otherwise."""
sorted_tasks = sorted(tasks)
return any(a2 > b1 for (a1, a2), (b1, b2) in zip(sorted_tasks, sorted_tasks[1:]))
print(has_any_clash([])) # False
print(has_any_clash([(0, 4)])) # False
print(has_any_clash([(1, 5), (0, 4)])) # True
print(has_any_clash([(5, 8), (0, 4), (6, 7)])) # True
print(has_any_clash([(9, 10), (0, 4), (6, 7)])) # False
如果你想要一个列表来表示哪些任务发生了冲突(这些任务可能不仅仅是连续的任务),那就需要稍微复杂一些。除了上述代码之外:
- 使用一个按照结束时间排序的
heap
来跟踪"正在运行"的任务 - 按照开始时间对任务进行排序,在每次迭代中:
- 弹出
heap
中结束时间小于当前任务开始时间的任务 heap
中剩余的任务与当前任务发生冲突- 将当前任务添加到
heap
中
- 弹出
英文:
Sort the list by start time, then just compare the last end to the next start time of consecutive entries. The complexity for this is just O(nlogn) for sorting and O(n) for iterating the pairs.
def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
"""Check if any tasks clash, return True if no clashes, false otherwise."""
sorted_tasks = sorted(tasks)
return any(a2 > b1 for (a1, a2), (b1, b2) in zip(sorted_tasks, sorted_tasks[1:]))
print(has_any_clash([])) # False
print(has_any_clash([(0, 4)])) # False
print(has_any_clash([(1, 5), (0, 4)])) # True
print(has_any_clash([(5, 8), (0, 4), (6, 7)])) # True
print(has_any_clash([(9, 10), (0, 4), (6, 7)])) # False
If you want a list which tasks clash (those could be more that just the consecutive ones) that is a bit more involved, but not by much. In addition to the above:
- keep track of "running" tasks, using a
heap
, sorted by end-time - iterate tasks sorted by start-time, in each iteration:
- pop tasks from running heap while end-time is < current tasks's start time
- all remaining tasks in running heap clash with current task
- add current task to running heap
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论