英文:
Time complexity of hash table, separate chaining (average case)
问题
在使用分离链接的哈希表中,插入操作的平均情况运行时间复杂度为 O(n/m + 1)
,其中 n/m
表示负载因子,+ 1
是哈希函数的部分。
在大O表示法下,是否可以将其视为等效于 O(n/m)
,因为 1
只是一个常数,而任何 n/m
的倍数都可以界定 n/m + 1
?
英文:
For a hash-table with separate chaining, the average case runtime complexity for insertion is O(n/m + 1)
where n/m
is the load factor and + 1
is for the hash function.
Could this be considered equivalent to O(n/m)
under big-O notation since 1
is just a constant, and any multiple of n/m
can bound n/m + 1
?
答案1
得分: 0
在表达插入操作的平均情况运行时复杂度时,通常会简化并删除大O表示法中的低阶分量,而不是包含尚未简化的内容。我建议您说"操作的数量"是 n/m + 1(或者可能是 +2,如果插入本身可以被视为一次操作),然后考虑如何表示简化的大O复杂度,就像您在这个问题中所做的一样:
"这是否可以被视为大O表示法下的 O(n/m)?因为 1 只是一个常数,n/m 的任何倍数都可以界定为 n/m + 1 吗?"
+1 确实是多余的。您的推理是正确的。另一种可能或可能不会有帮助的解释方式是:无论哈希速度有多慢,如果您正在使用与某个变量 K 的 O(log(log(K))) 复杂度相关的操作,那么该与 K 相关的值会随着 K 的增加而增长到足够大,使得 +1 变得无关紧要。
更有趣的问题是是否有意义将其写为 O(n/m),还是应该简化为 O(n) 或 O(1)?为了回答这个问题,让我们考虑 n/m 作为负载因子:如果我们不断进行插入操作,负载因子将如何随插入而变化?有两种可能性:
-
如果我们没有积极采取措施来保持负载因子在特定范围内,那么它将随插入线性增长;如果我们根本不变化桶计数,那么 m 因子是常数,因此复杂度简化为 O(n)。
-
如果我们随着插入而增长哈希表,那么虽然我们不必这样做,但为什么不将其保持在与 n 一定比例的情况下(即保持负载因子低于某个固定值),使插入复杂度为 O(1)?[这是明显要做的事情,也是我见过的每个可以调整大小的实现都实际执行的操作。]
因此,进一步插入的大O复杂度的最简化(因此是正确的)表示要么是 O(1)(用于调整大小的哈希表),要么是 O(n)(对于固定桶计数且负载因子远大于1的情况)。
英文:
> the average case runtime complexity for insertion is O(n/m + 1)
I normally simplify and remove lower-order components when expressing complexity in big-O notation, rather than having things in there that are yet to be simplified. I'd suggest you say "the number of operations" is n/m + 1 (or +2 perhaps, if the insertion itself can be considered an operation), then consider how to express a simplified big-O complexity, as you're indeed doing with this question:
> Could this be considered equivalent to O(n/m) under big-O notation since 1 is just a constant, and any multiple of n/m can bound n/m + 1?
The +1 is indeed superfluous. Your reasoning is correct. Another way of explaining it that might or might not help: no matter how slow it is to hash, if you were doing operations with even O(log(log(K)) complexity for some variable K, that K-related value would - as K increases - eventually grow large enough that the +1 would be irrelevant.
The more interesting question is whether it's meaningful to write O(n/m), or should you be simplifying to O(n) or O(1)? To answer that, let's think of n/m as the load factor: if we keep doing insertions, how will our load factor keep changing? There are two possibilities:
-
if we aren't actively doing something to keep the load factor in a particular range, then it will grow linearly with insertions; if we're not varying the bucket count at all, then the m factor is constant and hence the complexity simplifies to O(n)
-
if we are growing the hash table as we go, then while we don't have
to, why wouldn't we keep it in some linear proportion to n (i.e.
keeping the load factor beneath some fixed value), such
that the insertion complexity is O(1)? [That's the obvious
thing to do, and what every implementation I've ever seen that can
resize actually does do.]
So, the most-simplified (and therefore correct) expressions of the big-O complexity of further insertions is either O(1) (for resizing hash tables), or O(N) for fixed-bucket-count tables with load factors much greater than 1.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论