你可以使用Python生成所有唯一的嵌套2元组(嵌套配对)的方法是什么?

huangapple go评论66阅读模式
英文:

How can I generate all unique nested 2-tuples (nested pairings) of a set of n objects in Python?

问题

关于嵌套的二元组,我的意思是像这样的东西:((a,b),(c,(d,e))),其中所有元组都有两个元素。我不需要元素的不同排序,只需要不同的括号放置方式。对于 items = [a, b, c, d],有5种唯一的配对方式,它们分别是:

(((a,b),c),d)
((a,(b,c)),d)
(a,((b,c),d))
(a,(b,(c,d)))
((a,b),(c,d))

在理想的情况下,我还希望能够控制返回的元组的最大深度,因此,如果我生成了所有 items = [a, b, c, d] 的配对,且 max_depth=2,它只会返回 ((a,b),(c,d))

出现这个问题是因为我想找到一种方法来生成非交换、非结合数上的加法结果。如果 a+b 不等于 b+a,且 a+(b+c) 不等于 (a+b)+c,那么a、b和c的所有可能和是什么?

我已经编写了一个生成所有配对的函数,但它还会返回重复的结果。

import itertools

def all_pairings(items):
    if len(items) == 2:
        yield (*items,)
    else:
        for i, pair in enumerate(itertools.pairwise(items)):
            for pairing in all_pairings(items[:i] + [pair] + items[i+2:]):
                yield pairing

例如,对于 items=[a, b, c, d],它会两次返回 ((a,b),(c,d)),因为它在一个情况下首先将 (a,b) 配对,而在第二个情况中首先将 (c,d) 配对。

对于更多的项目,返回重复项变得更为严重。有重复项时,配对的数量以阶乘的方式增长,而没有重复项时,它按照卡特兰数(https://oeis.org/A000108)增长。

n 有重复项:(n-1)! 无重复项:(2(n-1))!/(n!(n-1)!)
1 1 1
2 1 1
3 2 2
4 6 5
5 24 14
6 120 42
7 720 132
8 5040 429
9 40320 1430
10 362880 4862

因此,我一直在尝试提出一种不需要搜索所有可能性,只需要搜索唯一可能性的算法。再次强调,也希望能够控制最大深度,但这可能可以添加到现有算法中。到目前为止,我一直没有成功找到一种方法,也没有找到任何涵盖这个具体问题的资源。我会感激任何帮助或链接到有用资源的信息。

英文:

By nested 2-tuples, I mean something like this: ((a,b),(c,(d,e))) where all tuples have two elements. I don't need different orderings of the elements, just the different ways of putting parentheses around them. For items = [a, b, c, d], there are 5 unique pairings, which are:

(((a,b),c),d)  
((a,(b,c)),d)  
(a,((b,c),d))  
(a,(b,(c,d)))  
((a,b),(c,d))

In a perfect world I'd also like to have control over the maximum depth of the returned tuples, so that if I generated all pairings of items = [a, b, c, d] with max_depth=2, it would only return ((a,b),(c,d)).

This problem turned up because I wanted to find a way to generate the results of addition on non-commutative, non-associative numbers. If a+b doesn't equal b+a, and a+(b+c) doesn't equal (a+b)+c, what are all the possible sums of a, b, and c?

I have made a function that generates all pairings, but it also returns duplicates.

import itertools

def all_pairings(items):
    if len(items) == 2:
        yield (*items,)
    else:
        for i, pair in enumerate(itertools.pairwise(items)):
            for pairing in all_pairings(items[:i] + [pair] + items[i+2:]):
                yield pairing

For example, it returns ((a,b),(c,d)) twice for items=[a, b, c, d], since it pairs up (a,b) first in one case and (c,d) first in the second case.

Returning duplicates becomes a bigger and bigger problem for larger numbers of items. With duplicates, the number of pairings grows factorially, and without duplicates it grows exponentially, according to the Catalan Numbers (https://oeis.org/A000108).

n With duplicates: (n-1)! Without duplicates: (2(n-1))!/(n!(n-1)!)
1 1 1
2 1 1
3 2 2
4 6 5
5 24 14
6 120 42
7 720 132
8 5040 429
9 40320 1430
10 362880 4862

Because of this, I have been trying to come up with an algorithm that doesn't need to search through all the possibilities, only the unique ones. Again, it would also be nice to have control over the maximum depth, but that could probably be added to an existing algorithm. So far I've been unsuccessful in coming up with an approach, and I also haven't found any resources that cover this specific problem. I'd appreciate any help or links to helpful resources.

答案1

得分: 7

使用递归生成器:

items = ['a', 'b', 'c', 'd']

def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(1, len(l)):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)

list(split(items))

输出:

[('a', ('b', ('c', 'd'))),
 ('a', (('b', 'c'), 'd')),
 (('a', 'b'), ('c', 'd')),
 (('a', ('b', 'c'), 'd')),
 (((('a', 'b'), 'c'), 'd'))]

唯一性检查:

assert len(list(split(list(range(10)))) == 4862

反转项目的顺序:

items = ['a', 'b', 'c', 'd']

def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(len(l)-1, 0, -1):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)

list(split(items))

输出:

[((('a', 'b'), 'c'), 'd'),
 ('a', ('b', 'c'), 'd'),
 ('a', ('b', 'c'), 'd'),
 ('a', (('b', 'c'), 'd')),
 ('a', ('b', ('c', 'd')))]

使用maxdepth

items = ['a', 'b', 'c', 'd']

def split(l, maxdepth=None):
    if len(l) == 1:
        yield l[0]
    elif maxdepth is not None and maxdepth <= 0:
        yield tuple(l)
    else:
        for i in range(1, len(l)):
            for a in split(l[:i], maxdepth=maxdepth and maxdepth-1):
                for b in split(l[i:], maxdepth=maxdepth and maxdepth-1):
                    yield (a, b)

list(split(items))
# 或者
list(split(items, maxdepth=3))
# 或者
list(split(items, maxdepth=2))

输出:

[('a', ('b', ('c', 'd'))),
 ('a', (('b', 'c'), 'd')),
 (('a', 'b'), ('c', 'd')),
 (('a', ('b', 'c'), 'd')),
 (((('a', 'b'), 'c'), 'd'))]

list(split(items, maxdepth=1))
[('a', ('b', 'c', 'd')),
 (('a', 'b'), ('c', 'd')),
 (('a', 'b', 'c'), 'd')]

list(split(items, maxdepth=0))
[('a', 'b', 'c', 'd')]
英文:

Using a recursive generator:

items = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]

def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(1, len(l)):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)
        
list(split(items))

Output:

[(&#39;a&#39;, (&#39;b&#39;, (&#39;c&#39;, &#39;d&#39;))),
 (&#39;a&#39;, ((&#39;b&#39;, &#39;c&#39;), &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;)), &#39;d&#39;),
 (((&#39;a&#39;, &#39;b&#39;), &#39;c&#39;), &#39;d&#39;)]

Check of uniqueness:

assert len(list(split(list(range(10))))) == 4862

Reversed order of the items:

items = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]

def split(l):
    if len(l) == 1:
        yield l[0]
    for i in range(len(l)-1, 0, -1):
        for a in split(l[:i]):
            for b in split(l[i:]):
                yield (a, b)
        
list(split(items))

[(((&#39;a&#39;, &#39;b&#39;), &#39;c&#39;), &#39;d&#39;),
 ((&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;)), &#39;d&#39;),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 (&#39;a&#39;, ((&#39;b&#39;, &#39;c&#39;), &#39;d&#39;)),
 (&#39;a&#39;, (&#39;b&#39;, (&#39;c&#39;, &#39;d&#39;)))]

With maxdepth:

items = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]

def split(l, maxdepth=None):
    if len(l) == 1:
        yield l[0]
    elif maxdepth is not None and maxdepth &lt;= 0:
        yield tuple(l)
    else:
        for i in range(1, len(l)):
            for a in split(l[:i], maxdepth=maxdepth and maxdepth-1):
                for b in split(l[i:], maxdepth=maxdepth and maxdepth-1):
                    yield (a, b)

list(split(items))
# or
list(split(items, maxdepth=3))
# or
list(split(items, maxdepth=2))

[(&#39;a&#39;, (&#39;b&#39;, (&#39;c&#39;, &#39;d&#39;))),
 (&#39;a&#39;, ((&#39;b&#39;, &#39;c&#39;), &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;)), &#39;d&#39;),
 (((&#39;a&#39;, &#39;b&#39;), &#39;c&#39;), &#39;d&#39;)]

list(split(items, maxdepth=1))

[(&#39;a&#39;, (&#39;b&#39;, &#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;), (&#39;c&#39;, &#39;d&#39;)),
 ((&#39;a&#39;, &#39;b&#39;, &#39;c&#39;), &#39;d&#39;)]

list(split(items, maxdepth=0))

[(&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;)]

答案2

得分: 5

归功于mozway提供的算法 - 我最初的想法是以逆波兰表示法表示配对,这不会导致以下优化:

首先,我们替换了两个嵌套循环:

for a in split(l[:i]):
    for b in split(l[i:]):
        yield (a, b)

-使用 itertools.product,它本身会缓存内部 split(...) 调用的结果,同时在内部的C代码中产生配对,运行速度更快。

yield from product(split(l[:i]), split(l[i:]))

接下来,我们缓存了先前的 split(...) 调用结果。为此,我们必须放弃生成器的惰性,并确保我们的函数参数是可哈希的。明确地说,这意味着创建一个将输入列表转换为元组的包装器,并修改函数体以返回列表而不是产生。

def split(l):
    return _split(tuple(l))

def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

然后,我们使用 functools.cache 装饰函数以进行缓存。所以把它们整合在一起:

from itertools import product
from functools import cache

def split(l):
    return _split(tuple(l))

@cache
def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

对以下输入进行测试-

test = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n']

-会产生以下计时:

原始版本: 5.922573089599609
修订版: 0.08888077735900879

我也验证了结果与原始版本完全匹配- 包括顺序。

再次感谢mozway提供的算法。我只是应用了一些优化来提高速度。

英文:

Full-credit to mozway for the algorithm - my original idea was to represent the pairing in reverse-polish notation, which would not have lent itself to the following optimizations:

First, we replace the two nested loops:

for a in split(l[:i]):
    for b in split(l[i:]):
        yield (a, b)

-with itertools.product, which will itself cache the results of the inner split(...) call, as well as produce the pairing in internal C code, which will run much faster.

yield from product(split(l[:i]), split(l[i:]))

Next, we cache the results of the previous split(...) calls. To do this we must sacrifice the laziness of generators, as well as ensure that our function parameters are hashable. Explicitly, this means creating a wrapper that casts the input list to a tuple, and to modify the function body to return lists instead of yielding.

def split(l):
    return _split(tuple(l))

def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

We then decorate the function with functools.cache, to perform the caching. So putting it all together:

from itertools import product
from functools import cache

def split(l):
    return _split(tuple(l))

@cache
def _split(l):
    if len(l) == 1:
        return l[:1]
    
    res = []
    for i in range(1, len(l)):
        res.extend(product(_split(l[:i]), _split(l[i:])))
    return res

Testing for following input-

test = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;, &#39;f&#39;, &#39;g&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;l&#39;, &#39;m&#39;, &#39;n&#39;]`

-produces the following timings:

Original: 5.922573089599609
Revised: 0.08888077735900879

I did also verify that the results matched the original exactly- order and all.

Again, full credit to mozway for the algorithm. I've just applied a few optimizations to speed it up a bit.

huangapple
  • 本文由 发表于 2023年2月18日 17:49:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/75492489.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定