英文:
Why is iterating over a map so much slower than iterating over a slice in Golang?
问题
我正在使用Golang中的映射实现稀疏矩阵,并注意到在这个改变之后,我的代码完成时间变得更长了很多。在排除了其他可能的原因后,似乎罪魁祸首是对映射本身的迭代。迭代50M个项目返回以下时间:
a: 1.195424429秒 b: 68.588488毫秒 diff: 17.777154632611037
我想知道,为什么与切片相比,迭代映射要慢近20倍?
英文:
I was implementing a sparse matrix using a map in Golang and I noticed that my code started taking much longer to complete after this change, after dismissing other possible causes, seems that the culprit is the iteration on the map itself. Go Playground link (doesn't work for some reason).
package main
import (
"fmt"
"time"
"math"
)
func main() {
z := 50000000
a := make(map[int]int, z)
b := make([]int, z)
for i := 0; i < z; i++ {
a[i] = i
b[i] = i
}
t0 := time.Now()
for key, value := range a {
if key != value { // never happens
fmt.Println("a", key, value)
}
}
d0 := time.Now().Sub(t0)
t1 := time.Now()
for key, value := range b {
if key != value { // never happens
fmt.Println("b", key, value)
}
}
d1 := time.Now().Sub(t1)
fmt.Println(
"a:", d0,
"b:", d1,
"diff:", math.Max(float64(d0), float64(d1)) / math.Min(float64(d0), float64(d1)),
)
}
Iterating over 50M items returns the following timings:
alix@local:~/Go/src$ go version
go version go1.3.3 linux/amd64
alix@local:~/Go/src$ go run b.go
a: 1.195424429s b: 68.588488ms diff: 17.777154632611037
I wonder, why is iterating over a map almost 20x as slow when compared to a slice?
答案1
得分: 21
这涉及到内存中的表示方式。你对不同数据结构的表示和算法复杂度的概念有多熟悉?遍历数组或切片很简单,因为值在内存中是连续的。然而,遍历映射需要遍历键空间并进行哈希表结构的查找。
映射具有动态能力,可以插入任何值的键,而不会使用大量空间来分配稀疏数组,并且尽管查找速度不如数组快,但可以高效地在键空间上进行查找,这就是为什么有时候哈希表优于数组的原因,尽管数组(和切片)在给定索引的情况下具有更快的“常数”(O(1))查找时间。
一切归结为你是否需要这种或那种数据结构的特性,以及你是否愿意处理相关的副作用或陷阱。
英文:
This comes down to the representation in memory. How familiar are you with the representation of different data structures and the concept of algorithmic complexity? Iterating over an array or slice is simple. Values are contiguous in memory. However iterating over a map requires traversing the key space and doing lookups into the hash-table structure.
The dynamic ability of maps to insert keys of any value without using up tons of space allocating a sparse array, and the fact that look-ups can be done efficiently over the key space despite being not as fast as an array, are why hash tables are sometimes preferred over an array, although arrays (and slices) have a faster "constant" (O(1))
lookup time given an index.
It all comes down to whether you need the features of this or that data structure and whether you're willing to deal with the side-effects or gotchas involved.
答案2
得分: 7
似乎将我的评论作为答案是合理的。你正在比较的底层结构是哈希表和数组(https://en.wikipedia.org/wiki/Hash_table vs https://en.wikipedia.org/wiki/Array_data_structure)。范围抽象实际上是(推测,找不到代码)遍历所有的键,访问每个值,并将两者赋值给k,v :=
。如果你对数组的访问不熟悉,它是常数时间,因为你只需将sizeof(type)*i添加到起始指针以获取该项。我不知道golang中map的内部情况,但我知道足够多的知识来知道它的内存表示和访问远远不如此高效。
关于这个主题,规范的陈述并不多;http://golang.org/ref/spec#For_statements
如果我有时间查找map和切片/数组的范围实现,并提供一些更多的技术细节。
英文:
Seems reasonable to put my comment as an answer. The underlying structures who's iteration performance you're comparing are a hash table and an array (https://en.wikipedia.org/wiki/Hash_table vs https://en.wikipedia.org/wiki/Array_data_structure). The range abstraction is actually (speculation, can't find the code) iterating all the keys, accessing each value, and assigning the two to k,v :=
. If you're not familiar with accessing in the array it is constant time because you just add sizeof(type)*i to the starting pointer to get the item. I don't know what the internals of map are in golang but I know enough to know that it's memory representation and therefor access is nothing close that efficient.
The specs statement on the topic isn't much; http://golang.org/ref/spec#For_statements
If I find the time to look up the implementation of range for map and slice/array I will and put some more technical details.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论