2017年2月28日 19:41:09go评论71阅读模式

英文:

Porting MeiYan hash function to Go

问题

我想将一种最先进的哈希函数MeiYan从C语言移植到Go语言。（据我所知，这是在速度和冲突率方面对于哈希表来说最好的哈希函数之一，至少比MurMur好。）

我对Go语言还很陌生，只花了一个周末的时间，写出了以下版本：

func meiyan(key *byte, count int) uint32 {
    type P *uint32
    var h uint32 = 0x811c9dc5
    for count >= 8 {
        a := ((*(*uint32)(unsafe.Pointer(key))) << 5)
        b := ((*(*uint32)(unsafe.Pointer(key))) >> 27)
        c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
        h = (h ^ ((a | b) ^ c)) * 0xad3e7
        count -= 8
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
    }
    if (count & 4) != 0 {
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
    }
    if (count & 2) != 0 {
        h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
        key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
    }
    if (count & 1) != 0 {
        h = (h ^ uint32(*key))
        h = h * 0xad3e7
    }
    return h ^ (h >> 16)
}

看起来有点乱，但我认为我无法让它看起来更好。现在我测量了一下速度，结果非常慢，比使用gccgo -O3编译的C/C++版本慢3倍。有办法让它更快吗？这是编译器能做到的最好的了，还是unsafe.Pointer转换就是最慢的？实际上，这让我感到惊讶，因为我看到其他一些类似的数值计算代码的速度与C语言相当，甚至更快。我在这里做了一些低效的事情吗？

以下是我正在移植的原始C代码：

u32 meiyan(const char *key, int count) {
    typedef u32* P;
    u32 h = 0x811c9dc5;
    while (count >= 8) {
        h = (h ^ ((((*(P)key) << 5) | ((*(P)key) >> 27)) ^ *(P)(key + 4))) * 0xad3e7;
        count -= 8;
        key += 8;
    }
    #define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
    if (count & 4) { tmp tmp }
    if (count & 2) { tmp }
    if (count & 1) { h = (h ^ *key) * 0xad3e7; }
    #undef tmp
    return h ^ (h >> 16);
}

这是我测量速度的方法：

func main(){
    T := time.Now().UnixNano()/1e6
    buf := []byte("Hello World!")
    var controlSum uint64 = 0
    for x := 123; x < 1e8; x++ {
        controlSum += uint64(meiyan(&buf[0], 12))
    }
    fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
    fmt.Println("controlSum:", controlSum)
}

英文:

I wanted to port a state-of-the-art hash function MeiYan from C to Go. (As far as I know this is one of the best if not just the best hash function for hash tables in terms of speed and collision rate, it beats MurMur at least.)

I am new to Go, just spent one weekend with it, and came up with this version:

func meiyan(key *byte, count int) uint32 {
	type P *uint32;
	var h uint32 = 0x811c9dc5;
	for ;count &gt;= 8; {
		a := ((*(*uint32)(unsafe.Pointer(key))) &lt;&lt; 5)
		b := ((*(*uint32)(unsafe.Pointer(key))) &gt;&gt; 27)
		c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
		h = (h ^ ((a | b) ^ c)) * 0xad3e7
		count -= 8
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
	}
	if (count &amp; 4) != 0 {
		h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
		h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
	}
	if (count &amp; 2) != 0 {
		h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
		key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
	}
	if (count &amp; 1) != 0 {
		h = (h ^ uint32(*key));
		h = h * 0xad3e7
	}
	return h ^ (h &gt;&gt; 16);
}

Looks messy, but I do not think I can make it look better. Now I measure the speed and it is frustratingly slow, 3 times slower than C/C++ when compiled with gccgo -O3. Can this be made faster? Is this just as good as compiler can make it or unsafe.Pointer conversion is just as slow as it gets? In fact this surprised me, because I have seen that some other number crunching style code was just as fast as C or even faster. Am I doing something inneficiently here?

Here is the original C code I am porting from:

u32 meiyan(const char *key, int count) {
	typedef u32* P;
	u32 h = 0x811c9dc5;
	while (count &gt;= 8) {
		h = (h ^ ((((*(P)key) &lt;&lt; 5) | ((*(P)key) &gt;&gt; 27)) ^ *(P)(key + 4))) * 0xad3e7;
		count -= 8;
		key += 8;
	}
	#define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
	if (count &amp; 4) { tmp tmp }
	if (count &amp; 2) { tmp }
	if (count &amp; 1) { h = (h ^ *key) * 0xad3e7; }
	#undef tmp
	return h ^ (h &gt;&gt; 16);
}

Here is how I measure speed:

func main(){
	T := time.Now().UnixNano()/1e6
	buf := []byte(&quot;Hello World!&quot;)
	var controlSum uint64 = 0
	for x := 123; x &lt; 1e8; x++ {
		controlSum += uint64(meiyan(&amp;buf[0], 12))
	}
	fmt.Println(time.Now().UnixNano()/1e6 - T, &quot;ms&quot;)
	fmt.Println(&quot;controlSum:&quot;, controlSum)
}

答案1

得分: 3

经过仔细研究，我发现了代码运行缓慢的原因，并进行了改进，现在在我的测试中比C版本更快：

package main

import (
	"fmt"
	"time"
	"unsafe"
)

func meiyan(key *byte, count int) uint32 {
	type un unsafe.Pointer
	type p32 *uint32
	type p16 *uint16
	type p8 *byte
	var h uint32 = 0x811c9dc5
	for count >= 8 {
		a := *p32(un(key)) << 5
		b := *p32(un(key)) >> 27
		c := *p32(un(uintptr(un(key)) + 4))
		h = (h ^ ((a | b) ^ c)) * 0xad3e7
		count -= 8
		key = p8(un(uintptr(un(key)) + 8))
	}
	if (count & 4) != 0 {
		h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
		key = p8(un(uintptr(un(key)) + 2))
		h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
		key = p8(un(uintptr(un(key)) + 2))
	}
	if (count & 2) != 0 {
		h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
		key = p8(un(uintptr(un(key)) + 2))
	}
	if (count & 1) != 0 {
		h = h ^ uint32(*key)
		h = h * 0xad3e7
	}
	return h ^ (h >> 16)
}

func main() {
	T := time.Now().UnixNano() / 1e6
	buf := []byte("ABCDEFGHABCDEFGH")
	var controlSum uint64 = 0
	start := &buf[0]
	size := len(buf)
	for x := 123; x < 1e8; x++ {
		controlSum += uint64(meiyan(start, size))
	}
	fmt.Println(time.Now().UnixNano()/1e6-T, "ms")
	fmt.Println("controlSum:", controlSum)
}

哈希函数本身已经很快，但是在每次迭代中解引用数组是导致它变慢的原因：&buf[0] 被替换为 start := &buf[0]，然后在每次迭代中使用 start。

英文:

After some careful research I found out why my code was slow, and improved it so it is now faster than the C version in my tests:

package main
import (
&quot;fmt&quot;
&quot;time&quot;
&quot;unsafe&quot;
)
func meiyan(key *byte, count int) uint32 {
type un unsafe.Pointer
type p32 *uint32
type p16 *uint16
type p8 *byte
var h uint32 = 0x811c9dc5;
for ;count &gt;= 8; {
a := *p32(un(key)) &lt;&lt; 5
b := *p32(un(key)) &gt;&gt; 27
c := *p32(un(uintptr(un(key)) + 4))
h = (h ^ ((a | b) ^ c)) * 0xad3e7
count -= 8
key = p8(un(uintptr(un(key)) + 8))
}
if (count &amp; 4) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count &amp; 2) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count &amp; 1) != 0 {
h = h ^ uint32(*key)
h = h * 0xad3e7
}
return h ^ (h &gt;&gt; 16);
}
func main() {
T := time.Now().UnixNano()/1e6
buf := []byte(&quot;ABCDEFGHABCDEFGH&quot;)
var controlSum uint64 = 0
start := &amp;buf[0]
size := len(buf)
for x := 123; x &lt; 1e8; x++ {
controlSum += uint64(meiyan(start, size))
}
fmt.Println(time.Now().UnixNano()/1e6 - T, &quot;ms&quot;)
fmt.Println(&quot;controlSum:&quot;, controlSum)
}

The hash function itself was already fast, but dereferencing the array on each iteration is what made it slow: &buf[0] was replaced with start := &buf[0] and then use start on each iteration.

答案2

得分: 1

NATS的实现看起来很令人印象深刻！在我的机器上，对于长度为30个字节的数据，每秒操作次数为157175656.56，每个操作的纳秒数为6.36！你可以看一下。也许会得到一些灵感。

英文:

The implementation from NATS looks impressive! On my machine, for a data of length 30 (bytes) op/sec 157175656.56 and nano-sec/op 6.36! Take a look at it. You might find some ideas.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将MeiYan哈希函数移植到Go语言。

问题

答案1

答案2

How to use docker secret/environment variable in golang dockertest.resource instead of hardcoded password

解析GOPATH值（Go 1.8+）

如何从嵌入结构体的方法中反射包含结构体的字段？

如何在Go中打印出切片的非连续部分？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论