什么决定了映射键的迭代顺序?

huangapple go评论88阅读模式
英文:

Go: what determines the iteration order for map keys?

问题

根据Go编程语言规范,它说:

> 3. 对于映射的迭代顺序没有指定。[...]

这是可以预料的,因为映射类型可以实现为哈希表,搜索树或其他数据结构。但是在Go中,map是如何实现的呢?

换句话说,是什么决定了在以下代码中键的迭代顺序:

for k, _ := range m { fmt.Println(k) }

在我看到具有string键的映射似乎确实具有某种迭代顺序之后,我开始思考这个问题。像下面这样的程序:

package main
import ("fmt"; "time"; "rand")

func main() {
    rand.Seed(time.Now().UnixNano())
    words := [...]string{"foo", "bar", "a", "b", "c", "hello", "world",
        "0", "1", "10", "100", "123"}
    stringMap := make(map[string]byte)
    for i := range rand.Perm(len(words)) {
        stringMap[words[i]] = byte(rand.Int())
    }
    fmt.Print("stringMap keys:")
    for k, _ := range stringMap { fmt.Print(" ", k) }
    fmt.Println()
}

在我的机器上输出如下:

stringMap keys: a c b 100 hello foo bar 10 world 123 1 0

不管插入顺序如何。

具有map[byte]byte映射的等效程序也以乱序打印键,但是这里键的顺序取决于插入顺序。

这一切是如何实现的?map是否针对整数和字符串进行了特殊化处理?

英文:

The Go Programming Language Specification says:

> 3. The iteration order over maps is not specified. [...]

That's to be expected since a map type can be implemented as a hash table, or as a search tree, or as some other data structure. But how is map actually implemented in Go?

Put differently, what determines the iteration order of the keys in

for k, _ := range m { fmt.Println(k) }

I started wondering about this after I saw that a map with string keys apparently do have a certain iteration order. A program like

package main
import ("fmt"; "time"; "rand")

func main() {
    rand.Seed(time.Seconds())
    words := [...]string{"foo", "bar", "a", "b", "c", "hello", "world",
	    "0", "1", "10", "100", "123"}
    stringMap := make(map[string]byte)
    for i := range rand.Perm(len(words)) {
	    stringMap[words[i]] = byte(rand.Int())
    }
    fmt.Print("stringMap keys:")
    for k, _ := range stringMap { fmt.Print(" ", k) }
    fmt.Println()
}

prints the following on my machine:

stringMap keys: a c b 100 hello foo bar 10 world 123 1 0

regardless of the insertion order.

The equivalent program with a map[byte]byte map also prints the keys in a shuffled order, but here the key order depends on the insertion order.

How is all this implemented? Is the map specialized for integers and for strings?

答案1

得分: 31

Map在Go中作为哈希映射实现。

Go运行时使用了一个通用的哈希映射实现,该实现是用C语言编写的。map[string]Tmap[byte]T之间唯一的实现差异是:哈希函数、等价函数和复制函数。

与(某些)C++的map不同,Go的map没有完全针对整数和字符串进行特殊化。

Go release.r60中,迭代顺序与插入顺序无关,只要没有键冲突。如果存在冲突,迭代顺序会受到插入顺序的影响。这一点对于键的类型没有任何区别,所以你的程序总是以相同的顺序打印字符串键只是巧合。除非修改了map,否则迭代顺序始终相同。

然而,在最新的Go weekly release(以及预计在本月发布的Go1版本)中,迭代顺序是随机的(它从一个伪随机选择的键开始,并且哈希码计算使用一个伪随机数种子)。如果你使用周版发布(和Go1)编译你的程序,每次运行程序时迭代顺序都会不同。也就是说,无限次运行程序可能不会打印出键集的所有可能排列。示例输出:

stringMap keys: b 0 hello c world 10 1 123 bar foo 100 a
stringMap keys: hello world c 1 10 bar foo 123 100 a b 0
stringMap keys: bar foo 123 100 world c 1 10 b 0 hello a
...
英文:

Map is implemented in Go as a hashmap.

The Go run-time uses a common hashmap implementation which is implemented in C. The only implementation differences between map[string]T and map[byte]T are: hash function, equivalence function and copy function.

Unlike (some) C++ maps, Go maps aren't fully specialized for integers and for strings.

In Go release.r60, the iteration order is independent from insertion order as long as there are no key collisions. If there are collisions, iteration order is affected by insertion order. This holds true regardless of key type. There is no difference between keys of type string and keys of type byte in this respect, so it is only a coincidence that your program always printed the string keys in the same order. The iteration order is always the same unless the map is modified.

However, in the newest Go weekly release (and in Go1 which may be expected to be released this month), the iteration order is randomized (it starts at a pseudo-randomly chosen key, and the hashcode computation is seeded with a pseudo-random number). If you compile your program with the weekly release (and with Go1), the iteration order will be different each time you run your program. That said, running your program an infinite number of times probably wouldn't print all possible permutations of the key set. Example outputs:

stringMap keys: b 0 hello c world 10 1 123 bar foo 100 a
stringMap keys: hello world c 1 10 bar foo 123 100 a b 0
stringMap keys: bar foo 123 100 world c 1 10 b 0 hello a
...

答案2

得分: 1

如果规范中说迭代顺序未指定,则特定情况下的特定顺序不被排除。

关键是无论在任何情况下,甚至在某些特殊情况下,都不能依赖于该顺序。实现可以在任何给定的时刻,包括运行时,自由地更改这种行为。

英文:

If the specs say the iteration order is not specified then a specific order in specific cases is not ruled out.

The point is one cannot rely on that order in any case, not even in some special case. The implementation is free to change this behavior at any given moment, run time included.

答案3

得分: 1

请注意,如果键之间存在总序关系(如果它们是同类型的话,通常会有这种情况),那么无论插入顺序如何,顺序都可能是稳定的;如果没有其他情况,它可以允许在生成相同哈希的键上进行高效搜索。

这也可能反映了不同的底层实现 - 特别是对于字符串,人们可能希望这样做,但对于整数,您可以使用稀疏数组。

英文:

Note that it is not that odd for order to be stable regardless of insertion order if there is a total order over the keys (as there frequently may be if they are of a homogenous type); if nothing else, it can allow efficient searching over keys which generate the same hash.

This may well also reflect a different underlying implementation - in particular, this is something people might want for strings, but for integers you could use a sparse array instead.

答案4

得分: 0

为了扩展@user811773的答案,一个半随机的range子句迭代并不意味着在map中返回每个元素的机会也是相等的。请参阅https://medium.com/i0exception/map-iteration-in-go-275abb76f721和https://play.golang.org/p/GpSd8i7XZoG。

package main

import "fmt"

type intSet map[int]struct{}

func (s intSet) put(v int) { s[v] = struct{}{} }
func (s intSet) get() (int, bool) {
    for k := range s {
        return k, true
    }
    return 0, false
}
func main() {
    s := make(intSet)
    for i := 0; i < 4; i++ {
        s.put(i)
    }
    counts := make(map[int]int)
    for i := 0; i < 1024*1024; i++ {
        v, ok := s.get()
        if !ok {return}
        counts[v]++
    }
    for k, v := range counts {
        fmt.Printf("Value: %v, Count: %v\n", k, v)
    }
}
/*
Value: 1, Count: 130752
Value: 2, Count: 130833
Value: 0, Count: 655840
Value: 3, Count: 131151
*/
英文:

To extend @user811773 answer. A semi-random range clause iteration does not mean that the chance of returning each element in a map is also equal. See https://medium.com/i0exception/map-iteration-in-go-275abb76f721 and https://play.golang.org/p/GpSd8i7XZoG.

package main

import &quot;fmt&quot;

type intSet map[int]struct{}

func (s intSet) put(v int) { s[v] = struct{}{} }
func (s intSet) get() (int, bool) {
	for k := range s {
		return k, true
	}
	return 0, false
}
func main() {
	s := make(intSet)
	for i := 0; i &lt; 4; i++ {
		s.put(i)
	}
	counts := make(map[int]int)
	for i := 0; i &lt; 1024*1024; i++ {
		v, ok := s.get()
		if !ok {return}
		counts[v]++
	}
	for k, v := range counts {
		fmt.Printf(&quot;Value: %v, Count: %v\n&quot;, k, v)
	}
}
/*
Value: 1, Count: 130752
Value: 2, Count: 130833
Value: 0, Count: 655840
Value: 3, Count: 131151
*/

huangapple
  • 本文由 发表于 2012年3月8日 22:48:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/9619479.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定