sympy.utilities.iterables中multiset_permutations函数的时间复杂度

huangapple go评论73阅读模式
英文:

Time complexity of the function multiset_permutations from sympy.utilities.iterables

问题

我想知道SciPy中multiset_permutations函数的时间复杂度。

我们可以这样使用该函数:

from sympy.utilities.iterables import multiset_permutations
from sympy import factorial
[''.join(i) for i in multiset_permutations('aab')]

我想知道使用这个函数与itertools中permutations函数的时间复杂度相比如何。

我已经在文档中进行了研究,但未找到相关信息。

英文:

I'd like to know the time complexity of the function multiset_permutations from SciPy.

We could use this function as:

from sympy.utilities.iterables import multiset_permutations
from sympy import factorial
[''.join(i) for i in multiset_permutations('aab')]

I'd like to know the time complexity of using this function in comparison to the time complexity of the function permutations from itertools.

I have researched about it in the documentation, but I could not find it.

答案1

得分: 2

# 枚举字典序的序列的不同排列。
def multiset_permutations(seq):
    lst = sorted(seq)
    while True:
        yield tuple(lst)
        i = len(lst) - 2
        while True:
            if i < 0:
                return
            if lst[i] < lst[i + 1]:
                break
            i -= 1
        lst[i + 1 :] = lst[:i:-1]
        j = i + 1
        while j < len(lst) - 1:
            if lst[i] < lst[j]:
                break
            j += 1
        lst[i], lst[j] = lst[j], lst[i]

import sympy
import timeit

print(
    timeit.timeit(
        'list(sympy.utilities.iterables.multiset_permutations("aabbccddee"))',
        number=10,
        globals=globals(),
    )
)
print(
    timeit.timeit(
        'list(multiset_permutations("aabbccddee"))',
        number=10,
        globals=globals(),
    )
)
英文:

The classical algorithm looks something like this:

# Enumerates the distinct permutations of seq in lexicographic order.
def multiset_permutations(seq):
    lst = sorted(seq)
    while True:
        yield tuple(lst)
        # Lexicographic order means incrementing the rightmost element that we
        # can. This is the element to the immediate left of the maximal
        # non-increasing suffix. Find it.
        i = len(lst) - 2
        while True:
            # lst is non-increasing, so there are no more permutations.
            if i &lt; 0:
                return
            if lst[i] &lt; lst[i + 1]:
                break
            i -= 1
        # Reverse the maximal non-increasing suffix.
        lst[i + 1 :] = lst[:i:-1]
        j = i + 1
        # Increment lst[i].
        while j &lt; len(lst) - 1:
            if lst[i] &lt; lst[j]:
                break
            j += 1
        lst[i], lst[j] = lst[j], lst[i]


import sympy
import timeit

print(
    timeit.timeit(
        &#39;list(sympy.utilities.iterables.multiset_permutations(&quot;aabbccddee&quot;))&#39;,
        number=10,
        globals=globals(),
    )
)
print(
    timeit.timeit(
        &#39;list(multiset_permutations(&quot;aabbccddee&quot;))&#39;,
        number=10,
        globals=globals(),
    )
)

sympy…does not do that. I feel for the plight of library developers, but
the sympy routine takes more than three times as long on my machine and
barfs on large inputs. If you don’t need the sympy dependency for
anything else, you don’t have to take it.

It’s straightforward that the routine above runs in time linear in the
total size of the output, Θ(n P) where n is the size of the multiset and
P ≤ n! is the number of multiset permutations.

The best we can say about the sympy routine in general is that it’s
O(n<sup>2</sup> P), contrary to my previous statement. The problem lines
are

for j in multiset_permutations(None, size - 1, do):
    if j:
        yield [k] + j

By building each permutation with this concatenation strategy, spread
across n recursive calls, sympy denies Python the opportunity to do
anything more efficient here (e.g., if the arguments were flipped, maybe
Python can cannibalize j somehow). The sympy routine has several
special cases, which avoid this in some cases but definitely not all.
(It’s tedious but straightforward to verify that the rest of the code
for each call runs in time O(n), which given the call stack of size
O(n), is sufficient to prove O(n<sup>2</sup> P).)

(As Kelly points out, the base library’s itertools.permutations is
written in C and uses a reference counting hack to reuse the tuple where
possible, depending the fact that Python code cannot mutate the tuple.
So it’s Ω(n!) and O(n (n!)), with the true running time depending on the
fraction of the time that the calling code forces the tuple to be
copied. The sympy routine both returns lists and is pure Python, so it
cannot do this and is Ω(n (n!)).)

答案2

得分: 1

检查 源代码sympy 1.11 multiset_permutations

我们可以看到,如果您有所有唯一的元素,它会从排列中产生 来自 使用 itertools 导入的排列。

所以对于这种情况,复杂度是相同的,但 multiset_permutations 会增加额外的开销。

如果我们以 abcdegfhij 的排列为例,使用 itertools.permutations 大约需要 300 毫秒,而使用 multiset_permutations 大约需要 700 毫秒。

然而,permutations 做了更多的工作。参考文档中的实现如下:

def ref_permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = list(range(n))
    cycles = list(range(n, n-r, -1))
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return

它已经比实际的 itertools.permutations 慢了 40 倍。

如果您有每个元素的频率列表,即 n = list(collections.Counter(x).values()),则 multiset_permutations 的复杂度为 O(factorial(sum(n)) / prod(factorial(ni) for ni in n)),而排列的复杂度为 O(factorial(sum(n)),如果 sum(1 for _ in itertools.product(x)) / sum(1 for _ in multiset_permutations(x)) > 100,那么 multiset_permutations 应该更快。

但请注意,它的复杂度至少是不同元素的阶乘。如果达到 15,您将面临严重的问题。

英文:

Checking the source code of the sympy 1.11 multiset_permutations.

We can see that if you have all unique elements it yelds from permutations imported from itertools.

So for that case the complexity is the same, but multiset_permutations it will add an overhead.

If we take instance the permutations of abcdegfhij, it takes ~300ms with itertools.permutations, ~700ms with multiset_permutations.

However permutations does a lot more work. Taking the reference implementation from the docs.

def ref_permutations(iterable, r=None):
    # permutations(&#39;ABCD&#39;, 2) --&gt; AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --&gt; 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r &gt; n:
        return
    indices = list(range(n))
    cycles = list(range(n, n-r, -1))
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return

It is already 40 times slower than the actual itertools.permutations.

If you have a list of the frequency of each element, i.e. n = list(collections.Counter(x).values()) the complexity for multiset_permutations is O(factorial(sum(n)) / prod(factorial(ni) for ni in n)), while the permutation is O(factorial(sum(n)), if sum(1 for _ in itertools.product(x)) / sum(1 for _ in multiset_permutations(x)) &gt; 100 then multiset_permutations should be faster.

But be aware that it the complexity is at least the factorial of the number of distinct elements. If that gets to 15 you are in serious trouble.

huangapple
  • 本文由 发表于 2023年2月24日 12:17:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552588.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定