英文:
Why isn't Python itertools.combinations() in O(1) amortized time complexity?
问题
这是您要翻译的内容:
"The itertools.combinations function of the CPython's standard library doesn't have an $O(1)$ amortized time complexity. When generating $k$-combinations of $n$ elements, the main issue arises from when $k$ gets close to $n$. We end up with an $O(n)$ amortized time complexity.
- Why such a choice from Python core developers ?
- Does there exist a code, whether it be a from a standard library or not, that guarantees an O(1) amortized time complexity ?
Quick details of why i ended up with amortized time complexity of $O(n)$ :
In that code, the number of combinations ensuring the pivot index is $i$ is $\binom{n-k+i}{i+1}$ and the number of array operations for such an instance is $3k-3i-1$. By multipliying both quantities and summing over all possible values of $i$, we obtain the total number of operations. Dividing this by $\binom{n}{k}$ the number of combinations, we end up with the amortized time complexity of $O(n)$.
The following self-contained C code is a rewrite of the CPython itertools.combinations function made for handling k-combinations of an array of integers between 0 and n-1, in an iterator-type style."
以下是翻译好的部分:
"CPython标准库的itertools.combinations函数没有$O(1)$摊销时间复杂度。在生成$n$个元素的$k$组合时,主要问题出现在$k$接近$n$时。我们最终得到$O(n)$的摊销时间复杂度。
- 为什么Python核心开发人员会做出这样的选择?
- 是否存在一段代码,无论是来自标准库还是其他地方,可以保证$O(1)$的摊销时间复杂度?
简要说明我为什么最终得到$O(n)$的摊销时间复杂度:
在这段代码中,确保枢轴索引为$i$的组合数为$\binom{n-k+i}{i+1}$,对于这种情况的数组操作数量为$3k-3i-1$。通过将这两个数量相乘并对所有可能的$i$值求和,我们得到总操作数量。将其除以组合数$\binom{n}{k}$,我们最终得到$O(n)$的摊销时间复杂度。
以下是处理0到$n-1$之间的整数数组的k组合的自包含C代码,以迭代器方式编写。"
英文:
The itertools.combinations function of the CPython's standard library doesn't have an $O(1)$ amortized time complexity. When generating $k$-combinations of $n$ elements, the main issue arises from when $k$ gets close to $n$. We end up with an $O(n)$ amortized time complexity.
- Why such a choice from Python core developers ?
- Does there exist a code, whether it be a from a standard library or not, that guarantees an O(1) amortized time complexity ?
Quick details of why i ended up with amortized time complexity of $O(n)$ :
In that code, the number of combinations ensuring the pivot index is $i$ is $\binom{n-k+i}{i+1}$ and the number of array operations for such an instance is $3k-3i-1$. By multipliying both quantities and summing over all possible values of $i$, we obtain the total number of operations. Dividing this by $\binom{n}{k}$ the number of combinations, we end up with the amortized time complexity of $O(n)$.
The following self-contained C code is a rewrite of the CPython itertools.combinations function made for handling k-combinations of an array of integers between 0 and n-1, in an iterator-type style.
#include <stdlib.h> // malloc
#include <stdio.h> // printf
#include <stdbool.h> //bool, true, false
/* TESTING */
int* integersInOrder(int size){
int* array = malloc(size * sizeof(int));
for(int i=0; i < size; i++)
array[i] = i;
return array;
}
void printArray(int* array, int size){
for(int i=0; i<size; i++)
printf("%d ", (int)array[i]);
printf("\n");
}
/* itertools.combinations */
static const bool IS_RUNNING = true, IS_DONE = false;
bool nextCombination(int* indices, int n, int k){
if(k > n)
return IS_DONE;
bool isFound = false;
int i;
// find furthest-right non-maximal index
for(i = k - 1; i >= 0; i--){
if(indices[i] != n - k + i){
isFound = true;
break;
}
}
if(!isFound)
return IS_DONE;
// increase index i by 1
indices[i]++;
// set all indices to the right of i to be the lexicographical minimum values
for(int j = i + 1; j < k; j++)
indices[j] = indices[j - 1] + 1;
return IS_RUNNING;
}
/* ENTRY POINT */
int main()
{
int n = 5, k = 3;
int* array = integersInOrder(k);
printArray(array, k);
while(IS_RUNNING == nextCombination(array, n, k)){
printArray(array, k);
}
return 0;
}
答案1
得分: 2
你的解决方案的问题在于你直接在原地修改了 array
。如果CPython采用了你的实现,list(itertools.combinations(range(4), 3))
的结果会变成类似 [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
,这会让用户感到惊讶。你需要复制 array
中前 k 个条目并将其返回,这将至少需要 O(k) 的时间,当 k 接近 n 时,这将变为 O(n)。
你无法避免这些复制操作,否则就会破坏用户的期望。
英文:
The problem with your solution is that you modify array
in place. If CPython were to adapt your implementation, the result of list(itertools.combinations(range(4), 3))
would be something like [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
, which would be surprising to users. You would need to make a copy of the first k entries in array
and yield that, which will take O(k) at the minimum, which, as k approaches n, is O(n).
You can't escape these copies without breaking user's expectations.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论