在使用 Contains() 前从一个 IEnumerable 实例化一个 HashSet 是一个好的实践吗?

huangapple go评论83阅读模式
英文:

Is it a good practice to instantiate a HashSet from an IEnumerable before using Contains()?

问题

以下是翻译好的部分:

下面的代码片段使用另一个IEnumerable<T>作为黑名单来过滤内容。被过滤的集合是通过远程获取的内容进行迭代(延迟加载,YouTube API)。

IEnumerable<CustomType> contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable<string> blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x => !blackListContent.Contains(x.Id));

Enumerable.Contains方法的时间复杂度为O(n),因此Enumerable.Where调用可能需要一些时间。

另一方面,HashSet<T>.Contains的时间复杂度为O(1)。从IEnumerable<T>实例化一个HashSet<T>似乎是O(n)。

如果黑名单将被多次使用,而且不考虑空间复杂度,那么在使用之前将其转换为HashSet<T>是否是一个好方法,还是这只是过早的优化?

英文:

The piece of code below filters an IEnumerable<T> with another, used as a blacklist. The filtered collection iterates over content fetched remotely (lazy loading, YouTube API).

IEnumerable<CustomType> contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable<string> blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x => !blackListContent.Contains(x.Id));

The Enumerable.Contains method is O(n) in time complexity, so the Enumerable.Where call could take a while.

In the other hand, HashSet<T>.Contains is O(1). Instantiating a HashSet<T> from an IEnumerable<T> seems to be O(n).

If the blacklist is about to be used multiple times, and without taking space complexity into account, is it a good approach to turn it into a HashSet<T> before using it or is this just premature optimization?

答案1

得分: 4

blackListContent 的大小为 mcontentThatCanBeHuge 的大小为 n

如果我们不使用 HashSet,时间复杂度为 O(n * O(m)) = O(n * m),空间复杂度为 O(1):对于 contentThatCanBeHuge 中的每一项,我们都需要扫描整个 blackListContent

如果我们使用 HashSet,时间复杂度为 O(m) + O(n * O(1)) = O(n + m),空间复杂度为 O(m)

  1. 我们创建一个 HashSet - 时间复杂度 O(m),空间复杂度 O(m)
  2. 对于 contentThatCanBeHuge 中的每一项,我们需要与 HashSet 进行比较 - 时间复杂度 O(n * O(1))

到目前为止,使用 HashSet 使代码运行更快,但会消耗更多内存。

英文:

Let size of blackListContent be m and size of contentThatCanBeHuge is n.

If we don't use HashSet, time complexity is O(n * O(m)) = O(n * m), space complexity is O(1): for each item in contentThatCanBeHuge we should scan entire blackListContent.

If we use HashSet, time complexity is O(m) + O(n * O(1)) = O(n + m), space complexity is O(m):

  1. We create HashSet - O(m) time complexity, O(m) space complexity.
  2. For each item in contentThatCanBeHuge we should check it with rspect of HashSet - O(n * O(1)) time complexity.

So far so good HashSet makes the code faster but we consumes more memory.

huangapple
  • 本文由 发表于 2023年8月4日 00:56:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830157.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定