Murmur3哈希在Go和Python之间的兼容性

huangapple go评论78阅读模式
英文:

Murmur3 Hash Compatibility Between Go and Python

问题

我们有两个不同的库,一个是Python库,一个是Go库,它们需要以相同的方式计算murmur3哈希值。不幸的是,无论我们如何努力,我们都无法让这两个库产生相同的结果。从这个关于Java和Python的Stack Overflow问题中可以看出,兼容性并不是一件简单的事情。

目前我们正在使用python mmh3Go github.com/spaolacci/murmur3库。

在Go中:

hash := murmur3.New128()
hash.Write([]byte("chocolate-covered-espresso-beans"))
fmt.Println(base64.RawURLEncoding.EncodeToString(hash.Sum(nil)))
// Output: cLHSo2nCBxyOezviLM5gwg

在Python中:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: jns74izOYMJwsdKjacIHHA (big byteorder)

hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='little', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg (little byteorder)

hash = mmh3.hash_bytes(name.encode('utf-8'))
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg

在Go中,murmur3返回一个uint64,所以我们在Python中假设signed=False;然而,我们也尝试过signed=True,但得不到匹配的哈希值。

我们可以尝试使用不同的库,但我们想知道我们在计算字符串的base64编码哈希时,Go和Python的方法是否存在问题。感谢任何帮助。

英文:

We have two different libraries, one in Python and one in Go that need to compute murmur3 hashes identically. Unfortunately no matter how hard we try, we cannot get the libraries to produce the same result. It appears from this SO question about Java and Python that compatibility isn't necessarily straight forward.

Right now we're using the python mmh3 and Go github.com/spaolacci/murmur3 libraries.

In Go:

hash := murmur3.New128()
hash.Write([]byte("chocolate-covered-espresso-beans"))
fmt.Println(base64.RawURLEncoding.EncodeToString(hash.Sum(nil)))
// Output: cLHSo2nCBxyOezviLM5gwg

In Python:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: jns74izOYMJwsdKjacIHHA (big byteorder)

hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='little', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg (little byteorder)

hash = mmh3.hash_bytes(name.encode('utf-8'))
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg

In Go, murmur3 returns a uint64 so we assume signed=False in Python; however we also tried signed=True and did not get matching hashes.

We're open to different libraries, but are wondering if there is something wrong with either our Go or Python methodologies of computing a base64 encoded hash from a string. Any help appreciated.

答案1

得分: 3

第一个Python结果几乎正确。

>>> binascii.hexlify(base64.b64decode('jns74izOYMJwsdKjacIHHA=='))
b'8e7b3be22cce60c270b1d2a369c2071c'

在Go中:

	x, y := murmur3.Sum128([]byte("chocolate-covered-espresso-beans"))
	fmt.Printf("%x %x\n", x, y)

结果为:

70b1d2a369c2071c 8e7b3be22cce60c2

所以两个单词的顺序被颠倒了。要在Python中获得相同的结果,你可以尝试以下代码:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
hash = hash[8:] + hash[:8]
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# cLHSo2nCBxyOezviLM5gwg
英文:

That first Python result is almost right.

>>> binascii.hexlify(base64.b64decode('jns74izOYMJwsdKjacIHHA=='))
b'8e7b3be22cce60c270b1d2a369c2071c'

In Go:

	x, y := murmur3.Sum128([]byte("chocolate-covered-espresso-beans"))
	fmt.Printf("%x %x\n", x, y)

Results in:

70b1d2a369c2071c 8e7b3be22cce60c2

So the order of the two words is flipped. To get the same result in Python, you can try something like:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
hash = hash[8:] + hash[:8]
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# cLHSo2nCBxyOezviLM5gwg

huangapple
  • 本文由 发表于 2023年4月4日 00:14:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75921577.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定