Golang的maps/hashmaps,优化迭代速度

huangapple go评论88阅读模式
英文:

Golang maps/hashmaps, optimizing for iteration speed

问题

当将一个生产环境的NodeJS应用迁移到Golang时,我注意到Golang的原生Map的迭代速度实际上比NodeJS慢。

我提出了一种替代方案,牺牲了删除/插入速度,而是提高了迭代速度,通过暴露一个可以迭代的数组,并在一个单独的Map中存储键=>索引对。

虽然这个解决方案有效,并且有显著的性能提升,但我想知道是否有更好的解决方案可以研究一下。

我的设置是很少从哈希映射中删除东西,只有添加和替换是常见的,对于这种情况,这个实现是有效的,尽管它更像是一个变通方法而不是一个真正的解决方案。

这些映射总是由一个整数索引,保存任意数据。

FastMap:    500000 Iterations -  0.153000ms
Native Map: 500000 Iterations -  4.988000ms
/*
  为迭代速度进行优化的无序哈希映射。
  将值存储在数组中,并在一个单独的哈希映射中保存键=>索引映射
*/

type FastMapEntry[K comparable, T any] struct {
	Key   K
	Value T
}

type FastMap[K comparable, T any] struct {
	m       map[K]int            // 存储键=>数组索引映射
	entries []FastMapEntry[K, T] // 存储条目及其键的数组
	len     int                  // 映射的总大小
}

func MakeFastMap[K comparable, T any]() *FastMap[K, T] {
	return &FastMap[K, T]{
		m:       make(map[K]int),
		entries: make([]FastMapEntry[K, T], 0),
	}
}

func (m *FastMap[K, T]) Set(key K, value T) {
	index, exists := m.m[key]
	if exists {
		// 如果键已存在,则替换值
		m.entries[index] = FastMapEntry[K, T]{
			Key:   key,
			Value: value,
		}
	} else {
		// 将键=>索引对存储在映射中,并将值添加到条目中。总大小增加一
		m.m[key] = m.len
		m.entries = append(m.entries, FastMapEntry[K, T]{
			Key:   key,
			Value: value,
		})
		m.len++
	}
}

func (m *FastMap[K, T]) Has(key K) bool {
	_, exists := m.m[key]

	return exists
}

func (m *FastMap[K, T]) Get(key K) (value T, found bool) {
	index, exists := m.m[key]
	if exists {
		found = true
		value = m.entries[index].Value
	}

	return
}

func (m *FastMap[K, T]) Remove(key K) bool {
	index, exists := m.m[key]
	if exists {
		// 从条目中删除值
		m.entries = append(m.entries[:index], m.entries[index+1:]...)
		// 删除键=>索引映射
		delete(m.m, key)
		m.len--

		for i := index; i < m.len; i++ {
			// 从当前索引开始,将所有索引映射向上移动
			m.m[m.entries[i].Key] = i
		}
	}

	return exists
}

func (m *FastMap[K, T]) Entries() []FastMapEntry[K, T] {
	return m.entries
}

func (m *FastMap[K, T]) Len() int {
	return m.len
}

运行的测试代码如下:

// s.Variations是一个包含约500k条记录的原生映射

start := time.Now()
iterations := 0
for _, variation := range s.Variations {
	if variation.Id > 0 {

	}
	iterations++
}
log.Printf("Native Map: %d Iterations -  %fms\n", iterations, float64(time.Since(start).Microseconds())/1000)

// 将数据复制到FastMap中
fm := helpers.MakeFastMap[state.VariationId, models.ItemVariation]()
for key, variation := range s.Variations {
	fm.Set(key, variation)
}

start = time.Now()
iterations = 0
for _, variation := range fm.Entries() {
	if variation.Value.Id > 0 {

	}
	iterations++
}
log.Printf("FastMap: %d Iterations -  %fms\n", iterations, float64(time.Since(start).Microseconds())/1000)
英文:

When migrating a production NodeJS application to Golang I've noticed that iteration of GO's native Map is actually slower than Node.

I've come up with an alternative solution that sacrifices removal/insertion speed with iteration speed instead, by exposing an array that can be iterated over and storing key=>index pairs inside a separate map.

While this solution works, and has a significant performance increase, I was wondering if there is a better solution to this that I could look into.

The setup I have is that its very rare something is removed from the hashmaps, only additions and replacements are common for which this implementation 'works', albeit feels like a workaround more than an actual solution.

The maps are always indexed by an integer, holding arbitrary data.

FastMap:    500000 Iterations -  0.153000ms
Native Map: 500000 Iterations -  4.988000ms
/*
  Unordered hash map optimized for iteration speed.
  Stores values in an array and holds key=&gt;index mappings inside a separate hashmap
*/

type FastMapEntry[K comparable, T any] struct {
	Key   K
	Value T
}

type FastMap[K comparable, T any] struct {
	m       map[K]int            // Stores key =&gt; array index mappings
	entries []FastMapEntry[K, T] // Array holding entries and their keys
	len     int                  // Total map size
}

func MakeFastMap[K comparable, T any]() *FastMap[K, T] {
	return &amp;FastMap[K, T]{
		m:       make(map[K]int),
		entries: make([]FastMapEntry[K, T], 0),
	}
}

func (m *FastMap[K, T]) Set(key K, value T) {
	index, exists := m.m[key]
	if exists {
		// Replace if key already exists
		m.entries[index] = FastMapEntry[K, T]{
			Key:   key,
			Value: value,
		}
	} else {
		// Store the key=&gt;index pair in the map and add value to entries. Increase total len by one
		m.m[key] = m.len
		m.entries = append(m.entries, FastMapEntry[K, T]{
			Key:   key,
			Value: value,
		})
		m.len++
	}
}

func (m *FastMap[K, T]) Has(key K) bool {
	_, exists := m.m[key]

	return exists
}

func (m *FastMap[K, T]) Get(key K) (value T, found bool) {
	index, exists := m.m[key]
	if exists {
		found = true
		value = m.entries[index].Value
	}

	return
}

func (m *FastMap[K, T]) Remove(key K) bool {
	index, exists := m.m[key]
	if exists {
		// Remove value from entries
		m.entries = append(m.entries[:index], m.entries[index+1:]...)
		// Remove key=&gt;index mapping
		delete(m.m, key)
		m.len--

		for i := index; i &lt; m.len; i++ {
			// Move all index mappings up, starting from current index
			m.m[m.entries[i].Key] = i
		}
	}

	return exists
}

func (m *FastMap[K, T]) Entries() []FastMapEntry[K, T] {
	return m.entries
}

func (m *FastMap[K, T]) Len() int {
	return m.len
}

The test code that was ran is:

// s.Variations is a native map holding ~500k records

start := time.Now()
iterations := 0
for _, variation := range s.Variations {
	if variation.Id &gt; 0 {

	}
	iterations++
}
log.Printf(&quot;Native Map: %d Iterations -  %fms\n&quot;, iterations, float64(time.Since(start).Microseconds())/1000)

// Copy data into FastMap
fm := helpers.MakeFastMap[state.VariationId, models.ItemVariation]()
for key, variation := range s.Variations {
	fm.Set(key, variation)
}

start = time.Now()
iterations = 0
for _, variation := range fm.Entries() {
	if variation.Value.Id &gt; 0 {

	}
	iterations++
}
log.Printf(&quot;FastMap: %d Iterations -  %fms\n&quot;, iterations, float64(time.Since(start).Microseconds())/1000)

答案1

得分: 1

我认为这种比较和基准测试有点离题。Go语言中的map实现与你的实现有很大的区别,主要是因为它需要覆盖更广泛的条目范围,编译时使用的结构体实际上有点重(虽然不是很重,它们基本上只存储了一些关于你在map中使用的类型的信息),而且实现方法也不同!Go语言中的map实际上是一个哈希表(你的实现显然不是,或者说它是,但实际的哈希实现是委托给你内部持有的m map的)。

导致你得到这个结果的另一个因素是,如果你看一下这段代码:

for _, variation := range fm.Entries() {
    if variation.Value.Id > 0 {

    }
    iterations++
}

基本上,你正在遍历一个切片,相对于遍历一个map来说,这更容易且更快速,你有一个对数组的视图,它将相同类型的元素放在一起,有道理,对吧?
要进行更好的比较,你应该这样做:

for _, y := range fastMap.m {
		_ = fastMap.Entries()[y].Value + 1 // 一些简单的计算
}

如果你真的追求性能,一个编写良好的哈希函数和一个固定大小的数组将是你的最佳选择。

英文:

I think this kind of comparison and benchmarking is a little off-topic. Go implementation of map is quite different from your implementation, basically because it needs to cover a wider area of entries, the structs used in compile time are actually kind of heavy (not so much though, they basically store some information about the types you use in your map and so on), and the implementation approach is different! Go implementation of map is basically a hashmap (yours is not obviously, or it is, but the actual hashing implementation is delegated to the m map you hold internally).

One of the other factors makes you get this result is, if you take a look at this:

for _, variation := range fm.Entries() {
    if variation.Value.Id &gt; 0 {

    }
    iterations++
}

Basically, you're iterating over a slice, which is much easier and faster to iterate rather than a map, you have a view to an array, which holds elements of the same types next to each other, makes sense, right?
What you should do to make a better comparison would be something like this:

for _, y := range fastMap.m {
		_ = fastMap.Entries()[y].Value + 1 // some simple calculation
}

If you're really looking for performance, a well written hash function and a fixed size array would be your best choice.

huangapple
  • 本文由 发表于 2022年7月18日 02:50:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/73014496.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定