为什么 Python 中的 itertools.combinations() 不是 O(1) 的摊销时间复杂度?

huangapple go评论70阅读模式
英文:

Why isn't Python itertools.combinations() in O(1) amortized time complexity?

问题

这是您要翻译的内容:

"The itertools.combinations function of the CPython's standard library doesn't have an $O(1)$ amortized time complexity. When generating $k$-combinations of $n$ elements, the main issue arises from when $k$ gets close to $n$. We end up with an $O(n)$ amortized time complexity.

  1. Why such a choice from Python core developers ?
  2. Does there exist a code, whether it be a from a standard library or not, that guarantees an O(1) amortized time complexity ?

Quick details of why i ended up with amortized time complexity of $O(n)$ :
In that code, the number of combinations ensuring the pivot index is $i$ is $\binom{n-k+i}{i+1}$ and the number of array operations for such an instance is $3k-3i-1$. By multipliying both quantities and summing over all possible values of $i$, we obtain the total number of operations. Dividing this by $\binom{n}{k}$ the number of combinations, we end up with the amortized time complexity of $O(n)$.

The following self-contained C code is a rewrite of the CPython itertools.combinations function made for handling k-combinations of an array of integers between 0 and n-1, in an iterator-type style."

以下是翻译好的部分:

"CPython标准库的itertools.combinations函数没有$O(1)$摊销时间复杂度。在生成$n$个元素的$k$组合时,主要问题出现在$k$接近$n$时。我们最终得到$O(n)$的摊销时间复杂度。

  1. 为什么Python核心开发人员会做出这样的选择?
  2. 是否存在一段代码,无论是来自标准库还是其他地方,可以保证$O(1)$的摊销时间复杂度?

简要说明我为什么最终得到$O(n)$的摊销时间复杂度:
在这段代码中,确保枢轴索引为$i$的组合数为$\binom{n-k+i}{i+1}$,对于这种情况的数组操作数量为$3k-3i-1$。通过将这两个数量相乘并对所有可能的$i$值求和,我们得到总操作数量。将其除以组合数$\binom{n}{k}$,我们最终得到$O(n)$的摊销时间复杂度。

以下是处理0到$n-1$之间的整数数组的k组合的自包含C代码,以迭代器方式编写。"

英文:

The itertools.combinations function of the CPython's standard library doesn't have an $O(1)$ amortized time complexity. When generating $k$-combinations of $n$ elements, the main issue arises from when $k$ gets close to $n$. We end up with an $O(n)$ amortized time complexity.

  1. Why such a choice from Python core developers ?
  2. Does there exist a code, whether it be a from a standard library or not, that guarantees an O(1) amortized time complexity ?

Quick details of why i ended up with amortized time complexity of $O(n)$ :
In that code, the number of combinations ensuring the pivot index is $i$ is $\binom{n-k+i}{i+1}$ and the number of array operations for such an instance is $3k-3i-1$. By multipliying both quantities and summing over all possible values of $i$, we obtain the total number of operations. Dividing this by $\binom{n}{k}$ the number of combinations, we end up with the amortized time complexity of $O(n)$.

The following self-contained C code is a rewrite of the CPython itertools.combinations function made for handling k-combinations of an array of integers between 0 and n-1, in an iterator-type style.

#include <stdlib.h> // malloc
#include <stdio.h> // printf
#include <stdbool.h> //bool, true, false

/* TESTING */
int* integersInOrder(int size){
    int* array = malloc(size * sizeof(int));
    for(int i=0; i < size; i++)
        array[i] = i;
    return array;
}

void printArray(int* array, int size){
    for(int i=0; i<size; i++)
        printf("%d ", (int)array[i]);
    printf("\n");
}

/* itertools.combinations */
static const bool IS_RUNNING = true, IS_DONE = false;

bool nextCombination(int* indices, int n, int k){
    if(k > n)
        return IS_DONE;
    
    bool isFound = false;
    int i;
    // find furthest-right non-maximal index
    for(i = k - 1; i >= 0; i--){
        if(indices[i] != n - k + i){
            isFound = true;
            break;
        }
    }
    if(!isFound)
        return IS_DONE;
    
    // increase index i by 1
    indices[i]++;
    // set all indices to the right of i to be the lexicographical minimum values
    for(int j = i + 1; j < k; j++)
        indices[j] = indices[j - 1] + 1;
   
    return IS_RUNNING;
}

/* ENTRY POINT */
int main()
{
    int n = 5, k = 3;
    int* array = integersInOrder(k);
    printArray(array, k);
    
    while(IS_RUNNING == nextCombination(array, n, k)){
        printArray(array, k);
    }

    return 0;
}

答案1

得分: 2

你的解决方案的问题在于你直接在原地修改了 array。如果CPython采用了你的实现,list(itertools.combinations(range(4), 3)) 的结果会变成类似 [[1, 2, 3], [1, 2, 3], [1, 2, 3]],这会让用户感到惊讶。你需要复制 array 中前 k 个条目并将其返回,这将至少需要 O(k) 的时间,当 k 接近 n 时,这将变为 O(n)。

你无法避免这些复制操作,否则就会破坏用户的期望。

英文:

The problem with your solution is that you modify array in place. If CPython were to adapt your implementation, the result of list(itertools.combinations(range(4), 3)) would be something like [[1, 2, 3], [1, 2, 3], [1, 2, 3]], which would be surprising to users. You would need to make a copy of the first k entries in array and yield that, which will take O(k) at the minimum, which, as k approaches n, is O(n).

You can't escape these copies without breaking user's expectations.

huangapple
  • 本文由 发表于 2023年5月25日 03:54:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76326985.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定