将MeiYan哈希函数移植到Go语言。

huangapple go评论71阅读模式
英文:

Porting MeiYan hash function to Go

问题

我想将一种最先进的哈希函数MeiYan从C语言移植到Go语言。(据我所知,这是在速度和冲突率方面对于哈希表来说最好的哈希函数之一,至少比MurMur好。)

我对Go语言还很陌生,只花了一个周末的时间,写出了以下版本:

func meiyan(key *byte, count int) uint32 {
    type P *uint32
    var h uint32 = 0x811c9dc5
    for count >= 8 {
        a := ((*(*uint32)(unsafe.Pointer(key))) << 5)
        b := ((*(*uint32)(unsafe.Pointer(key))) >> 27)
        c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
        h = (h ^ ((a | b) ^ c)) * 0xad3e7
        count -= 8
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
    }
    if (count & 4) != 0 {
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
    }
    if (count & 2) != 0 {
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
    }
    if (count & 1) != 0 {
        h = (h ^ uint32(*key))
        h = h * 0xad3e7
    }
    return h ^ (h >> 16)
}

看起来有点乱,但我认为我无法让它看起来更好。现在我测量了一下速度,结果非常慢,比使用gccgo -O3编译的C/C++版本慢3倍。有办法让它更快吗?这是编译器能做到的最好的了,还是unsafe.Pointer转换就是最慢的?实际上,这让我感到惊讶,因为我看到其他一些类似的数值计算代码的速度与C语言相当,甚至更快。我在这里做了一些低效的事情吗?

以下是我正在移植的原始C代码:

u32 meiyan(const char *key, int count) {
    typedef u32* P;
    u32 h = 0x811c9dc5;
    while (count >= 8) {
        h = (h ^ ((((*(P)key) << 5) | ((*(P)key) >> 27)) ^ *(P)(key + 4))) * 0xad3e7;
        count -= 8;
        key += 8;
    }
    #define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
    if (count & 4) { tmp tmp }
    if (count & 2) { tmp }
    if (count & 1) { h = (h ^ *key) * 0xad3e7; }
    #undef tmp
    return h ^ (h >> 16);
}

这是我测量速度的方法:

func main(){
    T := time.Now().UnixNano()/1e6
    buf := []byte("Hello World!")
    var controlSum uint64 = 0
    for x := 123; x < 1e8; x++ {
        controlSum += uint64(meiyan(&buf[0], 12))
    }
    fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
    fmt.Println("controlSum:", controlSum)
}
英文:

I wanted to port a state-of-the-art hash function MeiYan from C to Go. (As far as I know this is one of the best if not just the best hash function for hash tables in terms of speed and collision rate, it beats MurMur at least.)

I am new to Go, just spent one weekend with it, and came up with this version:

func meiyan(key *byte, count int) uint32 {
	type P *uint32;
	var h uint32 = 0x811c9dc5;
	for ;count &gt;= 8; {
		a := ((*(*uint32)(unsafe.Pointer(key))) &lt;&lt; 5)
		b := ((*(*uint32)(unsafe.Pointer(key))) &gt;&gt; 27)
		c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
		h = (h ^ ((a | b) ^ c)) * 0xad3e7
		count -= 8
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
	}
	if (count &amp; 4) != 0 {
		h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
		h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
	}
	if (count &amp; 2) != 0 {
		h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
	}
	if (count &amp; 1) != 0 {
		h = (h ^ uint32(*key));
		h = h * 0xad3e7
	}
	return h ^ (h &gt;&gt; 16);
}

Looks messy, but I do not think I can make it look better. Now I measure the speed and it is frustratingly slow, 3 times slower than C/C++ when compiled with gccgo -O3. Can this be made faster? Is this just as good as compiler can make it or unsafe.Pointer conversion is just as slow as it gets? In fact this surprised me, because I have seen that some other number crunching style code was just as fast as C or even faster. Am I doing something inneficiently here?

Here is the original C code I am porting from:

u32 meiyan(const char *key, int count) {
	typedef u32* P;
	u32 h = 0x811c9dc5;
	while (count &gt;= 8) {
		h = (h ^ ((((*(P)key) &lt;&lt; 5) | ((*(P)key) &gt;&gt; 27)) ^ *(P)(key + 4))) * 0xad3e7;
		count -= 8;
		key += 8;
	}
	#define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
	if (count &amp; 4) { tmp tmp }
	if (count &amp; 2) { tmp }
	if (count &amp; 1) { h = (h ^ *key) * 0xad3e7; }
	#undef tmp
	return h ^ (h &gt;&gt; 16);
}

Here is how I measure speed:

func main(){
	T := time.Now().UnixNano()/1e6
	buf := []byte(&quot;Hello World!&quot;)
	var controlSum uint64 = 0
	for x := 123; x &lt; 1e8; x++ {
		controlSum += uint64(meiyan(&amp;buf[0], 12))
	}
	fmt.Println(time.Now().UnixNano()/1e6 - T, &quot;ms&quot;)
	fmt.Println(&quot;controlSum:&quot;, controlSum)
}

答案1

得分: 3

经过仔细研究,我发现了代码运行缓慢的原因,并进行了改进,现在在我的测试中比C版本更快:

package main

import (
	"fmt"
	"time"
	"unsafe"
)

func meiyan(key *byte, count int) uint32 {
	type un unsafe.Pointer
	type p32 *uint32
	type p16 *uint16
	type p8 *byte
	var h uint32 = 0x811c9dc5
	for count >= 8 {
		a := *p32(un(key)) << 5
		b := *p32(un(key)) >> 27
		c := *p32(un(uintptr(un(key)) + 4))
		h = (h ^ ((a | b) ^ c)) * 0xad3e7
		count -= 8
		key = p8(un(uintptr(un(key)) + 8))
	}
	if (count & 4) != 0 {
		h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
		key = p8(un(uintptr(un(key)) + 2))
		h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
		key = p8(un(uintptr(un(key)) + 2))
	}
	if (count & 2) != 0 {
		h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
		key = p8(un(uintptr(un(key)) + 2))
	}
	if (count & 1) != 0 {
		h = h ^ uint32(*key)
		h = h * 0xad3e7
	}
	return h ^ (h >> 16)
}

func main() {
	T := time.Now().UnixNano() / 1e6
	buf := []byte("ABCDEFGHABCDEFGH")
	var controlSum uint64 = 0
	start := &buf[0]
	size := len(buf)
	for x := 123; x < 1e8; x++ {
		controlSum += uint64(meiyan(start, size))
	}
	fmt.Println(time.Now().UnixNano()/1e6-T, "ms")
	fmt.Println("controlSum:", controlSum)
}

哈希函数本身已经很快,但是在每次迭代中解引用数组是导致它变慢的原因:&buf[0] 被替换为 start := &buf[0],然后在每次迭代中使用 start

英文:

After some careful research I found out why my code was slow, and improved it so it is now faster than the C version in my tests:

package main
import (
&quot;fmt&quot;
&quot;time&quot;
&quot;unsafe&quot;
)
func meiyan(key *byte, count int) uint32 {
type un unsafe.Pointer
type p32 *uint32
type p16 *uint16
type p8 *byte
var h uint32 = 0x811c9dc5;
for ;count &gt;= 8; {
a := *p32(un(key)) &lt;&lt; 5
b := *p32(un(key)) &gt;&gt; 27
c := *p32(un(uintptr(un(key)) + 4))
h = (h ^ ((a | b) ^ c)) * 0xad3e7
count -= 8
key = p8(un(uintptr(un(key)) + 8))
}
if (count &amp; 4) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count &amp; 2) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count &amp; 1) != 0 {
h = h ^ uint32(*key)
h = h * 0xad3e7
}
return h ^ (h &gt;&gt; 16);
}
func main() {
T := time.Now().UnixNano()/1e6
buf := []byte(&quot;ABCDEFGHABCDEFGH&quot;)
var controlSum uint64 = 0
start := &amp;buf[0]
size := len(buf)
for x := 123; x &lt; 1e8; x++ {
controlSum += uint64(meiyan(start, size))
}
fmt.Println(time.Now().UnixNano()/1e6 - T, &quot;ms&quot;)
fmt.Println(&quot;controlSum:&quot;, controlSum)
}

The hash function itself was already fast, but dereferencing the array on each iteration is what made it slow: &amp;buf[0] was replaced with start := &amp;buf[0] and then use start on each iteration.

答案2

得分: 1

NATS的实现看起来很令人印象深刻!在我的机器上,对于长度为30个字节的数据,每秒操作次数为157175656.56每个操作的纳秒数为6.36!你可以看一下。也许会得到一些灵感。

英文:

The implementation from NATS looks impressive! On my machine, for a data of length 30 (bytes) op/sec 157175656.56 and nano-sec/op 6.36! Take a look at it. You might find some ideas.

huangapple
  • 本文由 发表于 2017年2月28日 19:41:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/42507846.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定