英文:
Is it a good practice to instantiate a HashSet from an IEnumerable before using Contains()?
问题
以下是翻译好的部分:
下面的代码片段使用另一个IEnumerable<T>作为黑名单来过滤内容。被过滤的集合是通过远程获取的内容进行迭代(延迟加载,YouTube API)。
IEnumerable<CustomType> contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable<string> blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x => !blackListContent.Contains(x.Id));
Enumerable.Contains方法的时间复杂度为O(n),因此Enumerable.Where调用可能需要一些时间。
另一方面,HashSet<T>.Contains的时间复杂度为O(1)。从IEnumerable<T>实例化一个HashSet<T>似乎是O(n)。
如果黑名单将被多次使用,而且不考虑空间复杂度,那么在使用之前将其转换为HashSet<T>是否是一个好方法,还是这只是过早的优化?
英文:
The piece of code below filters an IEnumerable<T> with another, used as a blacklist. The filtered collection iterates over content fetched remotely (lazy loading, YouTube API).
IEnumerable<CustomType> contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable<string> blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x => !blackListContent.Contains(x.Id));
The Enumerable.Contains method is O(n) in time complexity, so the Enumerable.Where call could take a while.
In the other hand, HashSet<T>.Contains is O(1). Instantiating a HashSet<T> from an IEnumerable<T> seems to be O(n).
If the blacklist is about to be used multiple times, and without taking space complexity into account, is it a good approach to turn it into a HashSet<T> before using it or is this just premature optimization?
答案1
得分: 4
让 blackListContent 的大小为 m,contentThatCanBeHuge 的大小为 n。
如果我们不使用 HashSet,时间复杂度为 O(n * O(m)) = O(n * m),空间复杂度为 O(1):对于 contentThatCanBeHuge 中的每一项,我们都需要扫描整个 blackListContent。
如果我们使用 HashSet,时间复杂度为 O(m) + O(n * O(1)) = O(n + m),空间复杂度为 O(m):
- 我们创建一个
HashSet- 时间复杂度O(m),空间复杂度O(m)。 - 对于
contentThatCanBeHuge中的每一项,我们需要与HashSet进行比较 - 时间复杂度O(n * O(1))。
到目前为止,使用 HashSet 使代码运行更快,但会消耗更多内存。
英文:
Let size of blackListContent be m and size of contentThatCanBeHuge is n.
If we don't use HashSet, time complexity is O(n * O(m)) = O(n * m), space complexity is O(1): for each item in contentThatCanBeHuge we should scan entire blackListContent.
If we use HashSet, time complexity is O(m) + O(n * O(1)) = O(n + m), space complexity is O(m):
- We create
HashSet-O(m)time complexity,O(m)space complexity. - For each item in
contentThatCanBeHugewe should check it with rspect of HashSet -O(n * O(1))time complexity.
So far so good HashSet makes the code faster but we consumes more memory.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论