英文:
Finding wildcard overlaps in less than O(n^2) time
问题
以下是翻译好的内容:
假设我有一个等长的元组列表,其中每个元素都是整数或“星号”,例如:
[
(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)
]
在这种情况下,元素1和3重叠,2和4重叠,5和6重叠。
是否有一种方法可以确定是否存在重叠(实际上我并不需要所有重叠,只需回答是否存在重叠或至少一个重叠),而不是使用检查所有可能的对来确定的朴素方法(这自然是O(n^2)
)。
来自评论的澄清:
-
上面提到的
n
是行数。这是可能变得很大的部分。对于这个问题,假设列数相对较小,例如不超过20。 -
“星号”表示确切地一个元素。
英文:
Lets say I have an list of tuples of equal length, where every element is either an integer or a "star", for example:
[
(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)
]
In this case, elements 1 and 3 overlap, 2 and 4 overlap, and 5 and 6 overlap.
Is there a way to determine if there is an overlap (I don't actually need all overlaps, just answer if there's no overlaps or at least one) in less time than the trivial approach, of checking all possible pairs against each other (which is O(n^2)
naturally).
Clarifications from comments:
-
n
given above is the number of rows. This is the thing that could get large. For the purposes of this problem, assume the number of columns is relatively small, say no more than 20. -
The "stars" represent EXACTLY one element.
答案1
得分: 5
以下是翻译的内容:
由于列可以以任何方式排序,所以只要存在没有通配符的列,Trie 概念可能就不太有用,因为这些列可以进行分区,从而减少搜索空间,只要它们包含的唯一元素越多。我建议按以下顺序对列进行排序:(1)通配符数量升序排列,(2)唯一元素数量降序排列,并执行深度优先搜索(DFS),优先考虑下一组队列,其总通配符数量最高,不包括用于下一分区的任何通配符。
例如,输入:
(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)
从右到左排序的列:
A (2, *, 1, 4, 5),
B (*, 3, 1, 4, 6),
C (*, 3, 1, 4, 5),
D (2, 3, 1, 4, 6),
E (*, *, 4, 5, 6),
F (*, 1, *, 5, 6)
第一分区:
{A, C} {B, D, E, F}
1/4 5/16 通配符比例
第二分区:
{B, D} {E, F}
1/6 4/6 通配符比例
等等。
英文:
Since the columns can be ordered in any way, the trie concept may be less useful so long as there are columns without wildcards since those can partition, reducing the search space, the more unique elements they contain. I'd suggest ordering the columns by (1) number of wildcards ascending (2) number of unique elements descending, and perform a DFS, prioritising the next group to queue by the highest ratio of total wildcards in it excluding any wildcards used for the next partition.
For example, input:
(1, 2, *, 4, 5),
(1, *, 3, 4, 6),
(1, *, 3, 4, 5),
(1, 2, 3, 4, 6),
(4, *, *, 5, 6),
(*, *, 1, 5, 6)
Ordered columns right to left:
A (2, *, 1, 4, 5),
B (*, 3, 1, 4, 6),
C (*, 3, 1, 4, 5),
D (2, 3, 1, 4, 6),
E (*, *, 4, 5, 6),
F (*, 1, *, 5, 6)
First partition:
{A, C} {B, D, E, F}
1/4 5/16 wildcard ratio
Second partition:
{B, D} {E, F}
1/6 4/6 wildcard ratio
Etc.
答案2
得分: 0
略微简化另一个答案。数字形成不相交的集合,然后对于每条边,将星号添加到所有属于入边的位置。然后在DFS中,按元组数量优先,无论它们是否为星号。(如果只剩下一个元组,且无重叠,则剪枝。)
我认为在最坏的情况下,我们有所有唯一数字和第一个位置上的一个星号,最终才发现这是不可行的。考虑到一个固定的元组,我认为时间复杂度会是O(元组数量
)。
英文:
Slight simplification to the other answer. The numbers form disjoint sets, and then, for each edge, add the stars to all positions that are in the incoming edge. Then in the DFS, prioritize by amount of tuples, regardless of whether they are stars. (Prune if only one tuple is left, no overlap.)
I think in the worst case we have all unique numbers and one star in the first position, and only find out at the end that it is unfeasible. Considering a fixed tuple, I think it would be O(tuples
).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论