2023年8月4日 00:56:05go评论112阅读模式

英文:

Is it a good practice to instantiate a HashSet from an IEnumerable before using Contains()?

问题

以下是翻译好的部分：

下面的代码片段使用另一个IEnumerable<T>作为黑名单来过滤内容。被过滤的集合是通过远程获取的内容进行迭代（延迟加载，YouTube API）。

IEnumerable&lt;CustomType&gt; contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable&lt;string&gt; blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x =&gt; !blackListContent.Contains(x.Id));

Enumerable.Contains方法的时间复杂度为O(n)，因此Enumerable.Where调用可能需要一些时间。

另一方面，HashSet<T>.Contains的时间复杂度为O(1)。从IEnumerable<T>实例化一个HashSet<T>似乎是O(n)。

如果黑名单将被多次使用，而且不考虑空间复杂度，那么在使用之前将其转换为HashSet<T>是否是一个好方法，还是这只是过早的优化？

英文:

The piece of code below filters an IEnumerable<T> with another, used as a blacklist. The filtered collection iterates over content fetched remotely (lazy loading, YouTube API).

IEnumerable&lt;CustomType&gt; contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable&lt;string&gt; blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x =&gt; !blackListContent.Contains(x.Id));

The Enumerable.Contains method is O(n) in time complexity, so the Enumerable.Where call could take a while.

In the other hand, HashSet<T>.Contains is O(1). Instantiating a HashSet<T> from an IEnumerable<T> seems to be O(n).

If the blacklist is about to be used multiple times, and without taking space complexity into account, is it a good approach to turn it into a HashSet<T> before using it or is this just premature optimization?

答案1

得分: 4

让 blackListContent 的大小为 m，contentThatCanBeHuge 的大小为 n。

如果我们不使用 HashSet，时间复杂度为 O(n * O(m)) = O(n * m)，空间复杂度为 O(1)：对于 contentThatCanBeHuge 中的每一项，我们都需要扫描整个 blackListContent。

如果我们使用 HashSet，时间复杂度为 O(m) + O(n * O(1)) = O(n + m)，空间复杂度为 O(m)：

我们创建一个 HashSet - 时间复杂度 O(m)，空间复杂度 O(m)。
对于 contentThatCanBeHuge 中的每一项，我们需要与 HashSet 进行比较 - 时间复杂度 O(n * O(1))。

到目前为止，使用 HashSet 使代码运行更快，但会消耗更多内存。

英文:

Let size of blackListContent be m and size of contentThatCanBeHuge is n.

If we don't use HashSet, time complexity is O(n * O(m)) = O(n * m), space complexity is O(1): for each item in contentThatCanBeHuge we should scan entire blackListContent.

If we use HashSet, time complexity is O(m) + O(n * O(1)) = O(n + m), space complexity is O(m):

We create HashSet - O(m) time complexity, O(m) space complexity.
For each item in contentThatCanBeHuge we should check it with rspect of HashSet - O(n * O(1)) time complexity.

So far so good HashSet makes the code faster but we consumes more memory.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在使用 Contains() 前从一个 IEnumerable 实例化一个 HashSet 是一个好的实践吗？

问题

答案1

Asp.net Core HealthChecks : AddDbContextCheck HealthReportEntry description and exception are empty when Unhealthy

如何在LINQ中从逗号分隔的字符串中找到精确匹配的数字。

Selected Card via Picker is not displaying data

玩家太慢，无法跳跃，并且以对角线移动得很快。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。