2023年2月18日 17:49:28go评论100阅读模式

英文:

How can I generate all unique nested 2-tuples (nested pairings) of a set of n objects in Python?

问题

关于嵌套的二元组，我的意思是像这样的东西：((a,b),(c,(d,e)))，其中所有元组都有两个元素。我不需要元素的不同排序，只需要不同的括号放置方式。对于 items = [a, b, c, d]，有5种唯一的配对方式，它们分别是：

(((a,b),c),d)
((a,(b,c)),d)
(a,((b,c),d))
(a,(b,(c,d)))
((a,b),(c,d))

在理想的情况下，我还希望能够控制返回的元组的最大深度，因此，如果我生成了所有 items = [a, b, c, d] 的配对，且 max_depth=2，它只会返回 ((a,b),(c,d))。

出现这个问题是因为我想找到一种方法来生成非交换、非结合数上的加法结果。如果 a+b 不等于 b+a，且 a+(b+c) 不等于 (a+b)+c，那么a、b和c的所有可能和是什么？

我已经编写了一个生成所有配对的函数，但它还会返回重复的结果。

import itertools
def all_pairings(items):
    if len(items) == 2:
        yield (*items,)
    else:
        for i, pair in enumerate(itertools.pairwise(items)):
            for pairing in all_pairings(items[:i] + [pair] + items[i+2:]):
                yield pairing

例如，对于 items=[a, b, c, d]，它会两次返回 ((a,b),(c,d))，因为它在一个情况下首先将 (a,b) 配对，而在第二个情况中首先将 (c,d) 配对。

对于更多的项目，返回重复项变得更为严重。有重复项时，配对的数量以阶乘的方式增长，而没有重复项时，它按照卡特兰数（https://oeis.org/A000108）增长。

n	有重复项：(n-1)!	无重复项：(2(n-1))!/(n!(n-1)!)
1	1	1
2	1	1
3	2	2
4	6	5
5	24	14
6	120	42
7	720	132
8	5040	429
9	40320	1430
10	362880	4862

因此，我一直在尝试提出一种不需要搜索所有可能性，只需要搜索唯一可能性的算法。再次强调，也希望能够控制最大深度，但这可能可以添加到现有算法中。到目前为止，我一直没有成功找到一种方法，也没有找到任何涵盖这个具体问题的资源。我会感激任何帮助或链接到有用资源的信息。

英文:

By nested 2-tuples, I mean something like this: ((a,b),(c,(d,e))) where all tuples have two elements. I don't need different orderings of the elements, just the different ways of putting parentheses around them. For items = [a, b, c, d], there are 5 unique pairings, which are:

(((a,b),c),d)  
((a,(b,c)),d)  
(a,((b,c),d))  
(a,(b,(c,d)))  
((a,b),(c,d))

In a perfect world I'd also like to have control over the maximum depth of the returned tuples, so that if I generated all pairings of items = [a, b, c, d] with max_depth=2, it would only return ((a,b),(c,d)).

This problem turned up because I wanted to find a way to generate the results of addition on non-commutative, non-associative numbers. If a+b doesn't equal b+a, and a+(b+c) doesn't equal (a+b)+c, what are all the possible sums of a, b, and c?

I have made a function that generates all pairings, but it also returns duplicates.

import itertools
def all_pairings(items):
    if len(items) == 2:
        yield (*items,)
    else:
        for i, pair in enumerate(itertools.pairwise(items)):
            for pairing in all_pairings(items[:i] + [pair] + items[i+2:]):
                yield pairing

For example, it returns ((a,b),(c,d)) twice for items=[a, b, c, d], since it pairs up (a,b) first in one case and (c,d) first in the second case.

Returning duplicates becomes a bigger and bigger problem for larger numbers of items. With duplicates, the number of pairings grows factorially, and without duplicates it grows exponentially, according to the Catalan Numbers (https://oeis.org/A000108).

n	With duplicates: (n-1)!	Without duplicates: (2(n-1))!/(n!(n-1)!)
1	1	1
2	1	1
3	2	2
4	6	5
5	24	14
6	120	42
7	720	132
8	5040	429
9	40320	1430
10	362880	4862

Because of this, I have been trying to come up with an algorithm that doesn't need to search through all the possibilities, only the unique ones. Again, it would also be nice to have control over the maximum depth, but that could probably be added to an existing algorithm. So far I've been unsuccessful in coming up with an approach, and I also haven't found any resources that cover this specific problem. I'd appreciate any help or links to helpful resources.

答案1

得分: 7

使用递归生成器：

items = ['a', 'b', 'c', 'd']
def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(1, len(l)):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)
list(split(items))

输出：

[('a', ('b', ('c', 'd'))),
 ('a', (('b', 'c'), 'd')),
 (('a', 'b'), ('c', 'd')),
 (('a', ('b', 'c'), 'd')),
 (((('a', 'b'), 'c'), 'd'))]

唯一性检查：

assert len(list(split(list(range(10)))) == 4862

反转项目的顺序：

items = ['a', 'b', 'c', 'd']
def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(len(l)-1, 0, -1):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)
list(split(items))

输出：

[((('a', 'b'), 'c'), 'd'),
 ('a', ('b', 'c'), 'd'),
 ('a', ('b', 'c'), 'd'),
 ('a', (('b', 'c'), 'd')),
 ('a', ('b', ('c', 'd')))]

使用maxdepth：

items = ['a', 'b', 'c', 'd']
def split(l, maxdepth=None):
    if len(l) == 1:
        yield l[0]
    elif maxdepth is not None and maxdepth <= 0:
        yield tuple(l)
    else:
        for i in range(1, len(l)):
            for a in split(l[:i], maxdepth=maxdepth and maxdepth-1):
                for b in split(l[i:], maxdepth=maxdepth and maxdepth-1):
                    yield (a, b)
list(split(items))
# 或者
list(split(items, maxdepth=3))
# 或者
list(split(items, maxdepth=2))

输出：

[('a', ('b', ('c', 'd'))),
 ('a', (('b', 'c'), 'd')),
 (('a', 'b'), ('c', 'd')),
 (('a', ('b', 'c'), 'd')),
 (((('a', 'b'), 'c'), 'd'))]
list(split(items, maxdepth=1))
[('a', ('b', 'c', 'd')),
 (('a', 'b'), ('c', 'd')),
 (('a', 'b', 'c'), 'd')]
list(split(items, maxdepth=0))
[('a', 'b', 'c', 'd')]

英文:

Using a recursive generator:

items = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]
def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(1, len(l)):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)
        
list(split(items))

Output:

[(&#39;a&#39;, (&#39;b&#39;, (&#39;c&#39;, &#39;d&#39;))),
 (&#39;a&#39;, ((&#39;b&#39;, &#39;c&#39;), &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;)), &#39;d&#39;),
 (((&#39;a&#39;, &#39;b&#39;), &#39;c&#39;), &#39;d&#39;)]

Check of uniqueness:

assert len(list(split(list(range(10))))) == 4862

Reversed order of the items:

items = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]
def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(len(l)-1, 0, -1):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)
        
list(split(items))
[(((&#39;a&#39;, &#39;b&#39;), &#39;c&#39;), &#39;d&#39;),
 ((&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;)), &#39;d&#39;),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 (&#39;a&#39;, ((&#39;b&#39;, &#39;c&#39;), &#39;d&#39;)),
 (&#39;a&#39;, (&#39;b&#39;, (&#39;c&#39;, &#39;d&#39;)))]

With maxdepth:

items = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]
def split(l, maxdepth=None):
    if len(l) == 1:
        yield l[0]
    elif maxdepth is not None and maxdepth &lt;= 0:
        yield tuple(l)
    else:
        for i in range(1, len(l)):
            for a in split(l[:i], maxdepth=maxdepth and maxdepth-1):
                for b in split(l[i:], maxdepth=maxdepth and maxdepth-1):
                    yield (a, b)
list(split(items))
# or
list(split(items, maxdepth=3))
# or
list(split(items, maxdepth=2))
[(&#39;a&#39;, (&#39;b&#39;, (&#39;c&#39;, &#39;d&#39;))),
 (&#39;a&#39;, ((&#39;b&#39;, &#39;c&#39;), &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;)), &#39;d&#39;),
 (((&#39;a&#39;, &#39;b&#39;), &#39;c&#39;), &#39;d&#39;)]
list(split(items, maxdepth=1))
[(&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;, &#39;c&#39;), &#39;d&#39;)]
list(split(items, maxdepth=0))
[(&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;)]

答案2

得分: 5

归功于mozway提供的算法 - 我最初的想法是以逆波兰表示法表示配对，这不会导致以下优化：

首先，我们替换了两个嵌套循环：

for a in split(l[:i]):
    for b in split(l[i:]):
        yield (a, b)

-使用 itertools.product，它本身会缓存内部 split(...) 调用的结果，同时在内部的C代码中产生配对，运行速度更快。

yield from product(split(l[:i]), split(l[i:]))

接下来，我们缓存了先前的 split(...) 调用结果。为此，我们必须放弃生成器的惰性，并确保我们的函数参数是可哈希的。明确地说，这意味着创建一个将输入列表转换为元组的包装器，并修改函数体以返回列表而不是产生。

def split(l):
    return _split(tuple(l))
def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

然后，我们使用 functools.cache 装饰函数以进行缓存。所以把它们整合在一起：

from itertools import product
from functools import cache
def split(l):
    return _split(tuple(l))
@cache
def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

对以下输入进行测试-

test = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n']

-会产生以下计时：

原始版本: 5.922573089599609
修订版: 0.08888077735900879

我也验证了结果与原始版本完全匹配- 包括顺序。

再次感谢mozway提供的算法。我只是应用了一些优化来提高速度。

英文:

Full-credit to mozway for the algorithm - my original idea was to represent the pairing in reverse-polish notation, which would not have lent itself to the following optimizations:

First, we replace the two nested loops:

for a in split(l[:i]):
    for b in split(l[i:]):
        yield (a, b)

-with itertools.product, which will itself cache the results of the inner split(...) call, as well as produce the pairing in internal C code, which will run much faster.

yield from product(split(l[:i]), split(l[i:]))

Next, we cache the results of the previous split(...) calls. To do this we must sacrifice the laziness of generators, as well as ensure that our function parameters are hashable. Explicitly, this means creating a wrapper that casts the input list to a tuple, and to modify the function body to return lists instead of yielding.

def split(l):
    return _split(tuple(l))
def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

We then decorate the function with functools.cache, to perform the caching. So putting it all together:

from itertools import product
from functools import cache
def split(l):
    return _split(tuple(l))
@cache
def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

Testing for following input-

test = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;, &#39;f&#39;, &#39;g&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;l&#39;, &#39;m&#39;, &#39;n&#39;]`

-produces the following timings:

Original: 5.922573089599609
Revised: 0.08888077735900879

I did also verify that the results matched the original exactly- order and all.

Again, full credit to mozway for the algorithm. I've just applied a few optimizations to speed it up a bit.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以使用Python生成所有唯一的嵌套2元组（嵌套配对）的方法是什么？

问题

答案1

答案2

分割并展平一个数据框成多个数据框

在Google Colab或Jupyter Notebook中使用Sherlock。

为什么我的简单Python代码不起作用，我错过了什么？

R：将数据框进行分组和扩展，仅包括列中可能的名称对。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。