C#: 字典或键值对列表

huangapple go评论61阅读模式
英文:

C#: Dictionary or list of pairs

问题

我正在读取大量数据,需要将其存储为相应值的配对。然后,我将需要查找每个键对应的值,几乎每个键都要查找一次。

由于我只会针对每个键进行一次查找,是否值得使用字典,或者在性能方面使用List<Pair<...>>会更好?

英文:

I'm reading a large amount of data that I need to store as pairs of corresponding values. Then I'll need to look up the corresponding values for the keys, once for almost each key.

Since I'll be searching for each key only once, is it worth it to use a Dictionary, or would it be better in terms of performance to use List&lt;Pair&lt;...>> here?

答案1

得分: 1

这取决于你拥有多少数据以及计算字典哈希的复杂程度。

对于列表,你的平均查找时间将是N/2。对于字典,哈希解析始终相同,但这些哈希解析可能相当昂贵。因此,对于任何数据集,都存在某个数量N,其中列表的平均查找成本开始超过哈希解析。在这个数量之下,列表更快。在这个数量之上,字典更快。

这个分界点取决于你的数据。我记得很多年前我们曾经使用10个项目作为一个经验法则,但我不知道那个规则到底有多有效。但实际上,了解的最佳方法是在你的实际数据样本上尝试使用两者。

另外,我们还需要考虑排序的影响。如果你知道你需要精确一次性地查找列表中的每个项目,那么基本未排序数据的总成本为N个项目 * N/2平均查找时间。这是一个O(n^2)算法,效率不高。但很多时候,你可以对数据进行排序,以便知道查找将按顺序发生,然后遍历列表。这可能会更加高效,是一个O(n*log(n))算法,更有可能胜过字典。

英文:

It depends on how much data you have and how complicated it is to calculate the dictionary hashes.

Your average lookup time for the List will be N/2. For the Dictionary it will always be the same hash resolution, but these hash resolutions can be fairly expensive. Therefore, for any data set there exists some number N where the List's average lookup cost starts to exceed the hash resolution. Below this number, the List is faster. Above the number, the Dictionary is faster.

Where this breaking point happens depends on your data. I recall many years ago we used to use just 10 items as a rule of thumb, but I have no idea how valid that really is. Really, though, the best way to find out is to actually try both on a sample of your real data.

We also need to throw in sorting as a further wrinkle. If you know you'll need to find every item in a list exactly once, then the total cost for the basic unsorted data is N items * N/2 average lookup time. This is a O(n<sup>2</sup>) algorithm, which is not great. But very often you can sort the data so you know the look-ups will happen sequentially, and then walk the List. This can be MUCH more efficient &mdash; O(n*log(n)) &mdash; and is more likely to beat the dictionary.

huangapple
  • 本文由 发表于 2023年5月24日 21:17:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76323989.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定