2023年7月3日 12:15:23go评论70阅读模式

英文:

Finding wildcard overlaps in less than O(n^2) time

问题

以下是翻译好的内容：

假设我有一个等长的元组列表，其中每个元素都是整数或“星号”，例如：

[
(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)
]

在这种情况下，元素1和3重叠，2和4重叠，5和6重叠。

是否有一种方法可以确定是否存在重叠（实际上我并不需要所有重叠，只需回答是否存在重叠或至少一个重叠），而不是使用检查所有可能的对来确定的朴素方法（这自然是O(n^2)）。

来自评论的澄清：

上面提到的 n 是行数。这是可能变得很大的部分。对于这个问题，假设列数相对较小，例如不超过20。
“星号”表示确切地一个元素。

英文:

Lets say I have an list of tuples of equal length, where every element is either an integer or a "star", for example:

[
(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)
]

In this case, elements 1 and 3 overlap, 2 and 4 overlap, and 5 and 6 overlap.

Is there a way to determine if there is an overlap (I don't actually need all overlaps, just answer if there's no overlaps or at least one) in less time than the trivial approach, of checking all possible pairs against each other (which is O(n^2) naturally).

Clarifications from comments:

n given above is the number of rows. This is the thing that could get large. For the purposes of this problem, assume the number of columns is relatively small, say no more than 20.
The "stars" represent EXACTLY one element.

答案1

得分: 5

以下是翻译的内容：

由于列可以以任何方式排序，所以只要存在没有通配符的列，Trie 概念可能就不太有用，因为这些列可以进行分区，从而减少搜索空间，只要它们包含的唯一元素越多。我建议按以下顺序对列进行排序：（1）通配符数量升序排列，（2）唯一元素数量降序排列，并执行深度优先搜索（DFS），优先考虑下一组队列，其总通配符数量最高，不包括用于下一分区的任何通配符。

例如，输入：

(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)

从右到左排序的列：

A (2, *, 1, 4, 5),
B (*, 3, 1, 4, 6),
C (*, 3, 1, 4, 5),
D (2, 3, 1, 4, 6),
E (*, *, 4, 5, 6),
F (*, 1, *, 5, 6)

第一分区：

{A, C} {B, D, E, F}
 1/4       5/16     通配符比例

第二分区：

{B, D} {E, F}
 1/6    4/6         通配符比例

等等。

英文:

Since the columns can be ordered in any way, the trie concept may be less useful so long as there are columns without wildcards since those can partition, reducing the search space, the more unique elements they contain. I'd suggest ordering the columns by (1) number of wildcards ascending (2) number of unique elements descending, and perform a DFS, prioritising the next group to queue by the highest ratio of total wildcards in it excluding any wildcards used for the next partition.
For example, input:

(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)

Ordered columns right to left:

A (2, *, 1, 4, 5),
B (*, 3, 1, 4, 6),
C (*, 3, 1, 4, 5),
D (2, 3, 1, 4, 6),
E (*, *, 4, 5, 6),
F (*, 1, *, 5, 6)

First partition:

{A, C} {B, D, E, F}
 1/4       5/16     wildcard ratio

Second partition:

{B, D} {E, F}
 1/6    4/6         wildcard ratio

Etc.

答案2

得分: 0

略微简化另一个答案。数字形成不相交的集合，然后对于每条边，将星号添加到所有属于入边的位置。然后在DFS中，按元组数量优先，无论它们是否为星号。（如果只剩下一个元组，且无重叠，则剪枝。）

在少于O(n^2)时间内查找通配符重叠

我认为在最坏的情况下，我们有所有唯一数字和第一个位置上的一个星号，最终才发现这是不可行的。考虑到一个固定的元组，我认为时间复杂度会是O(元组数量)。

英文:

Slight simplification to the other answer. The numbers form disjoint sets, and then, for each edge, add the stars to all positions that are in the incoming edge. Then in the DFS, prioritize by amount of tuples, regardless of whether they are stars. (Prune if only one tuple is left, no overlap.)

I think in the worst case we have all unique numbers and one star in the first position, and only find out at the end that it is unfeasible. Considering a fixed tuple, I think it would be O(tuples).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在少于O(n^2)时间内查找通配符重叠

问题

答案1

答案2

更快的固定字符串strstr版本

去重优化

如何在JavaScript中执行这个映射？

Golang代码在Hackerrank上运行太慢了。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论