哈希表的时间复杂度,分离链接法(平均情况)

huangapple go评论50阅读模式
英文:

Time complexity of hash table, separate chaining (average case)

问题

在使用分离链接的哈希表中,插入操作的平均情况运行时间复杂度为 O(n/m + 1),其中 n/m 表示负载因子,+ 1 是哈希函数的部分。

在大O表示法下,是否可以将其视为等效于 O(n/m),因为 1 只是一个常数,而任何 n/m 的倍数都可以界定 n/m + 1

英文:

For a hash-table with separate chaining, the average case runtime complexity for insertion is O(n/m + 1) where n/m is the load factor and + 1 is for the hash function.

Could this be considered equivalent to O(n/m) under big-O notation since 1 is just a constant, and any multiple of n/m can bound n/m + 1?

答案1

得分: 0

在表达插入操作的平均情况运行时复杂度时,通常会简化并删除大O表示法中的低阶分量,而不是包含尚未简化的内容。我建议您说"操作的数量"是 n/m + 1(或者可能是 +2,如果插入本身可以被视为一次操作),然后考虑如何表示简化的大O复杂度,就像您在这个问题中所做的一样:

"这是否可以被视为大O表示法下的 O(n/m)?因为 1 只是一个常数,n/m 的任何倍数都可以界定为 n/m + 1 吗?"

+1 确实是多余的。您的推理是正确的。另一种可能或可能不会有帮助的解释方式是:无论哈希速度有多慢,如果您正在使用与某个变量 K 的 O(log(log(K))) 复杂度相关的操作,那么该与 K 相关的值会随着 K 的增加而增长到足够大,使得 +1 变得无关紧要。

更有趣的问题是是否有意义将其写为 O(n/m),还是应该简化为 O(n) 或 O(1)?为了回答这个问题,让我们考虑 n/m 作为负载因子:如果我们不断进行插入操作,负载因子将如何随插入而变化?有两种可能性:

  • 如果我们没有积极采取措施来保持负载因子在特定范围内,那么它将随插入线性增长;如果我们根本不变化桶计数,那么 m 因子是常数,因此复杂度简化为 O(n)。

  • 如果我们随着插入而增长哈希表,那么虽然我们不必这样做,但为什么不将其保持在与 n 一定比例的情况下(即保持负载因子低于某个固定值),使插入复杂度为 O(1)?[这是明显要做的事情,也是我见过的每个可以调整大小的实现都实际执行的操作。]

因此,进一步插入的大O复杂度的最简化(因此是正确的)表示要么是 O(1)(用于调整大小的哈希表),要么是 O(n)(对于固定桶计数且负载因子远大于1的情况)。

英文:

> the average case runtime complexity for insertion is O(n/m + 1)

I normally simplify and remove lower-order components when expressing complexity in big-O notation, rather than having things in there that are yet to be simplified. I'd suggest you say "the number of operations" is n/m + 1 (or +2 perhaps, if the insertion itself can be considered an operation), then consider how to express a simplified big-O complexity, as you're indeed doing with this question:

> Could this be considered equivalent to O(n/m) under big-O notation since 1 is just a constant, and any multiple of n/m can bound n/m + 1?

The +1 is indeed superfluous. Your reasoning is correct. Another way of explaining it that might or might not help: no matter how slow it is to hash, if you were doing operations with even O(log(log(K)) complexity for some variable K, that K-related value would - as K increases - eventually grow large enough that the +1 would be irrelevant.

The more interesting question is whether it's meaningful to write O(n/m), or should you be simplifying to O(n) or O(1)? To answer that, let's think of n/m as the load factor: if we keep doing insertions, how will our load factor keep changing? There are two possibilities:

  • if we aren't actively doing something to keep the load factor in a particular range, then it will grow linearly with insertions; if we're not varying the bucket count at all, then the m factor is constant and hence the complexity simplifies to O(n)

  • if we are growing the hash table as we go, then while we don't have
    to, why wouldn't we keep it in some linear proportion to n (i.e.
    keeping the load factor beneath some fixed value), such
    that the insertion complexity is O(1)? [That's the obvious
    thing to do, and what every implementation I've ever seen that can
    resize actually does do.]

So, the most-simplified (and therefore correct) expressions of the big-O complexity of further insertions is either O(1) (for resizing hash tables), or O(N) for fixed-bucket-count tables with load factors much greater than 1.

huangapple
  • 本文由 发表于 2023年5月25日 08:19:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76328148.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定