为什么我的哈希表实现如此慢?

huangapple go评论97阅读模式
英文:

Go: Why is my hashtable implementation so slow?

问题

所以我正在尝试创建一个超轻、故意占用大量内存但非常快速的哈希表,用于非常快速的查找,我不关心内存使用情况,也不关心它是否会偶尔出错。

基本上,它只是创建一个巨大的数组(是的,是数组,而不是切片),使用修改过的FNVa哈希(修改为仅在数组范围内给出哈希)对字符串进行哈希,然后使用哈希作为数组索引保存或查找值。理论上,这应该是存储和检索键值对的最快方式。

这是我的基准测试代码:

package main
import (
    "fmt"
    "time"
)

const dicsize250 = 2097152000 // tested 115 collisions

type Dictionary250_uint16 struct {
    dictionary [dicsize250]uint16
}

func (d *Dictionary250_uint16) Add(s string, v uint16) {
    i := id(s,dicsize250)
    d.dictionary[i]=v
    return
}

func (d *Dictionary250_uint16) Delete(s string) {
    i := id(s,dicsize250)
    d.dictionary[i]=0
    return
}

func (d *Dictionary250_uint16) Exists(s string) bool {
    i := id(s,dicsize250)
    if d.dictionary[i]==0 {
        return false
    } else {
        return true
    }
}

func (d *Dictionary250_uint16) Find(s string) uint16 {
    i := id(s,dicsize250)
    return d.dictionary[i]
}

// This is a FNVa hash algorithm, modified to limit to dicsize
func id(s string, dicsize uint64) uint64 {
    var hash uint64 = 2166136261
    for _, c := range s {
        hash = (hash^uint64(c))*16777619
    }
    return hash%dicsize
}

var donothing bool
func main() {

    dic := new(Dictionary250_uint16)
    dic.Add(`test1`,10)
    dic.Add(`test2`,20)
    dic.Add(`test3`,30)
    dic.Add(`test4`,40)
    dic.Add(`test5`,50)

    mc := make(map[string]uint16)
    mc[`test1`]=10
    mc[`test2`]=10
    mc[`test3`]=10
    mc[`test4`]=10
    mc[`test5`]=10

    var t1 uint
    var t2 uint
    var t3 uint
    donothing = true

    // Dic hit
    t1 = uint(time.Now().UnixNano())
    for i:=0; i<50000000; i++ {
        if dic.Exists(`test4`) {
            donothing = true
        }
    }
    t3 = uint(time.Now().UnixNano())
    t2 = t3-t1
    fmt.Println("Dic (hit) took ",t2)

    // Dic miss
    t1 = uint(time.Now().UnixNano())
    for i:=0; i<50000000; i++ {
        if dic.Exists(`whate`) {
            donothing = true
        }
    }
    t3 = uint(time.Now().UnixNano())
    t2 = t3-t1
    fmt.Println("Dic (miss) took ",t2)

    // Map hit
    t1 = uint(time.Now().UnixNano())
    for i:=0; i<50000000; i++ {
        _,ok := mc[`test4`]
        if ok {
            donothing=true
        }
    }
    t3 = uint(time.Now().UnixNano())
    t2 = t3-t1
    fmt.Println("Map (hit) took ",t2)

    // Map miss
    t1 = uint(time.Now().UnixNano())
    for i:=0; i<50000000; i++ {
        _,ok := mc[`whate`]
        if ok {
            donothing=true
        }
    }
    t3 = uint(time.Now().UnixNano())
    t2 = t3-t1
    fmt.Println("Map (miss) took ",t2)

    donothing = false
}

我得到的结果是:

Dic (hit) took  2,858,604,059
Dic (miss) took  2,457,173,526
Map (hit) took  1,574,306,146
Map (miss) took  2,525,206,080

基本上,我的哈希表实现比仅使用map要慢得多,特别是在命中时。我不明白为什么会这样,因为map是一个相对复杂的实现,它永远不会有任何冲突,并且执行了更多的计算。而我的实现非常简单,依赖于具有所有可能索引的大型数组。

我做错了什么?

英文:

So I'm trying to make a super light, deliberately memory heavy, yet very fast hashtable for very fast lookups where I don't care about memory usage and I don't care if it makes a rare mistake.

Basically it just creates a ginormous array (yes array, not slice), hashes a string using a modified FNVa hash (modified to give only a hash within the array bounds) and then saves or lookups up the value using the hash as the array index. In theory this should be the fastest possible way to store and retrieve a key=>value pair.

This is my benchmark:

package main
import (
&quot;fmt&quot;
&quot;time&quot;
)
const dicsize250 = 2097152000 // tested 115 collisions
type Dictionary250_uint16 struct {
dictionary [dicsize250]uint16
}
func (d *Dictionary250_uint16) Add(s string, v uint16) {
i := id(s,dicsize250)
d.dictionary[i]=v
return
}
func (d *Dictionary250_uint16) Delete(s string) {
i := id(s,dicsize250)
d.dictionary[i]=0
return
}
func (d *Dictionary250_uint16) Exists(s string) bool {
i := id(s,dicsize250)
if d.dictionary[i]==0 {
return false
} else {
return true
}
}
func (d *Dictionary250_uint16) Find(s string) uint16 {
i := id(s,dicsize250)
return d.dictionary[i]
}
// This is a FNVa hash algorithm, modified to limit to dicsize
func id(s string, dicsize uint64) uint64 {
var hash uint64 = 2166136261
for _, c := range s {
hash = (hash^uint64(c))*16777619
}
return hash%dicsize
}
var donothing bool
func main() {
dic := new(Dictionary250_uint16)
dic.Add(`test1`,10)
dic.Add(`test2`,20)
dic.Add(`test3`,30)
dic.Add(`test4`,40)
dic.Add(`test5`,50)
mc := make(map[string]uint16)
mc[`test1`]=10
mc[`test2`]=10
mc[`test3`]=10
mc[`test4`]=10
mc[`test5`]=10
var t1 uint
var t2 uint
var t3 uint
donothing = true
// Dic hit
t1 = uint(time.Now().UnixNano())
for i:=0; i&lt;50000000; i++ {
if dic.Exists(`test4`) {
donothing = true
}
}
t3 = uint(time.Now().UnixNano())
t2 = t3-t1
fmt.Println(&quot;Dic (hit) took &quot;,t2)
// Dic miss
t1 = uint(time.Now().UnixNano())
for i:=0; i&lt;50000000; i++ {
if dic.Exists(`whate`) {
donothing = true
}
}
t3 = uint(time.Now().UnixNano())
t2 = t3-t1
fmt.Println(&quot;Dic (miss) took &quot;,t2)
// Map hit
t1 = uint(time.Now().UnixNano())
for i:=0; i&lt;50000000; i++ {
_,ok := mc[`test4`]
if ok {
donothing=true
}
}
t3 = uint(time.Now().UnixNano())
t2 = t3-t1
fmt.Println(&quot;Map (hit) took &quot;,t2)
// Map miss
t1 = uint(time.Now().UnixNano())
for i:=0; i&lt;50000000; i++ {
_,ok := mc[`whate`]
if ok {
donothing=true
}
}
t3 = uint(time.Now().UnixNano())
t2 = t3-t1
fmt.Println(&quot;Map (miss) took &quot;,t2)
donothing = false
}

The results I get are:

Dic (hit) took  2,858,604,059
Dic (miss) took  2,457,173,526
Map (hit) took  1,574,306,146
Map (miss) took  2,525,206,080

Basically my hashtable implementation is a lot slower, especially on hits, than just using a map. I don't see how this is possible since map is a heavy implementation (in comparison) which doesn't ever have any collisions, and does a lot more calculations. Whereas my implementation is super simple and relies on having a massive array of all possible indices.

What I am doing wrong?

答案1

得分: 3

首先,与内置的映射相比,您使用了非常大量的内存,但这是您提到的一种权衡。

使用标准库的基准工具。它将为您提供一个可靠的基础,更容易进行性能分析,并消除很多猜测。我有一段时间将您的代码剪切并粘贴到基准测试中:

func BenchmarkDictHit(b *testing.B) {
    donothing = true

    dic := new(Dictionary250_uint16)
    dic.Add(`test1`, 10)
    dic.Add(`test2`, 20)
    dic.Add(`test3`, 30)
    dic.Add(`test4`, 40)
    dic.Add(`test5`, 50)

    // The initial Dict allocation is very expensive!
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        if dic.Exists(`test4`) {
            donothing = true
        }
    }
}

func BenchmarkDictMiss(b *testing.B) {
    donothing = true

    dic := new(Dictionary250_uint16)
    dic.Add(`test1`, 10)
    dic.Add(`test2`, 20)
    dic.Add(`test3`, 30)
    dic.Add(`test4`, 40)
    dic.Add(`test5`, 50)

    // The initial Dict allocation is very expensive!
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        if dic.Exists(`test6`) {
            donothing = true
        }
    }
}

func BenchmarkMapHit(b *testing.B) {
    donothing = true
    mc := make(map[string]uint16)
    mc[`test1`] = 10
    mc[`test2`] = 10
    mc[`test3`] = 10
    mc[`test4`] = 10
    mc[`test5`] = 10

    b.ResetTimer()

    // Map hit
    for i := 0; i < b.N; i++ {
        _, ok := mc[`test4`]
        if ok {
            donothing = true
        }
    }

    donothing = false
}
func BenchmarkMapMiss(b *testing.B) {
    donothing = true
    mc := make(map[string]uint16)
    mc[`test1`] = 10
    mc[`test2`] = 10
    mc[`test3`] = 10
    mc[`test4`] = 10
    mc[`test5`] = 10

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        _, ok := mc[`test6`]
        if ok {
            donothing = true
        }
    }
    donothing = false
}

在没有ResetTimer()调用的情况下,初始分配的切片会主导基准测试时间,即使在运行中进行了摊销,也会严重扭曲结果。重置后,基准测试的时间看起来是有序的:

BenchmarkDictHit	50000000	        39.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkDictMiss	50000000	        39.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapHit	100000000	        22.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapMiss	50000000	        36.8 ns/op	       0 B/op	       0 allocs/op

您的id函数需要迭代字符串。对于字符串,range不会迭代字节,而是寻找符文,这将更加昂贵。您将希望直接索引字符串,或者可能在整个过程中使用[]byte(在成本上大致相同)。通过更好的字符串处理,这是我测试的最终时间。

BenchmarkDictHit	100000000	        17.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkDictMiss	100000000	        17.2 ns/op	       0 B/op	       0 allocs/op
英文:

For one thing, you're using a very large amount of memory compared to the built-in map, but that's a trade-of you mentioned you wanted to make.

Use the standard libraries benchmark utilities. It will give you a solid base to work from, easier profiling access, and remove a lot of guesswork. I had a moment to cut&paste some of your code into a benchmark:

func BenchmarkDictHit(b *testing.B) {
donothing = true
dic := new(Dictionary250_uint16)
dic.Add(`test1`, 10)
dic.Add(`test2`, 20)
dic.Add(`test3`, 30)
dic.Add(`test4`, 40)
dic.Add(`test5`, 50)
// The initial Dict allocation is very expensive!
b.ResetTimer()
for i := 0; i &lt; b.N; i++ {
if dic.Exists(`test4`) {
donothing = true
}
}
}
func BenchmarkDictMiss(b *testing.B) {
donothing = true
dic := new(Dictionary250_uint16)
dic.Add(`test1`, 10)
dic.Add(`test2`, 20)
dic.Add(`test3`, 30)
dic.Add(`test4`, 40)
dic.Add(`test5`, 50)
// The initial Dict allocation is very expensive!
b.ResetTimer()
for i := 0; i &lt; b.N; i++ {
if dic.Exists(`test6`) {
donothing = true
}
}
}
func BenchmarkMapHit(b *testing.B) {
donothing = true
mc := make(map[string]uint16)
mc[`test1`] = 10
mc[`test2`] = 10
mc[`test3`] = 10
mc[`test4`] = 10
mc[`test5`] = 10
b.ResetTimer()
// Map hit
for i := 0; i &lt; b.N; i++ {
_, ok := mc[`test4`]
if ok {
donothing = true
}
}
donothing = false
}
func BenchmarkMapMiss(b *testing.B) {
donothing = true
mc := make(map[string]uint16)
mc[`test1`] = 10
mc[`test2`] = 10
mc[`test3`] = 10
mc[`test4`] = 10
mc[`test5`] = 10
b.ResetTimer()
for i := 0; i &lt; b.N; i++ {
_, ok := mc[`test6`]
if ok {
donothing = true
}
}
donothing = false
}

Without the ResetTimer() call, the initial allocation of your backing slice dominates the benchmark time, and even when amortized across the runs it skew the results heavily. After the reset, the benchmark times look in order:

BenchmarkDictHit	50000000	        39.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkDictMiss	50000000	        39.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapHit	100000000	        22.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapMiss	50000000	        36.8 ns/op	       0 B/op	       0 allocs/op

Your id function needs to iterate over a string. With strings, range doesn't iterate bytes, it looks for runes which is going to be more expensive. You will want to index the string directly, or possibly use []byte throughout (about the same cost-wise). With better string handling, these are the final timings from my test.

BenchmarkDictHit	100000000	        17.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkDictMiss	100000000	        17.2 ns/op	       0 B/op	       0 allocs/op

答案2

得分: 1

这是我版本的JimB原始版本的基准测试结果:

BenchmarkDictHit	30000000	        40.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkDictMiss	30000000	        40.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapHit		100000000	        20.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapMiss	50000000	        29.5 ns/op	       0 B/op	       0 allocs/op

在有利的情况下,Go的映射实现可以非常快速。总体而言,您的基准测试是人为的且没有意义的。

英文:

Here are the results from my version of JimB's original version of your benchmark:

BenchmarkDictHit	30000000	        40.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkDictMiss	30000000	        40.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapHit		100000000	        20.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkMapMiss	50000000	        29.5 ns/op	       0 B/op	       0 allocs/op

Under favorable circumstances, Go's map implementation can be quite fast. Overall, your benchmark is contrived and meaningless.

huangapple
  • 本文由 发表于 2014年7月23日 02:54:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/24895456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定