英文:
Murmur3 Hash Compatibility Between Go and Python
问题
我们有两个不同的库,一个是Python库,一个是Go库,它们需要以相同的方式计算murmur3哈希值。不幸的是,无论我们如何努力,我们都无法让这两个库产生相同的结果。从这个关于Java和Python的Stack Overflow问题中可以看出,兼容性并不是一件简单的事情。
目前我们正在使用python mmh3和Go github.com/spaolacci/murmur3库。
在Go中:
hash := murmur3.New128()
hash.Write([]byte("chocolate-covered-espresso-beans"))
fmt.Println(base64.RawURLEncoding.EncodeToString(hash.Sum(nil)))
// Output: cLHSo2nCBxyOezviLM5gwg
在Python中:
name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: jns74izOYMJwsdKjacIHHA (big byteorder)
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='little', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg (little byteorder)
hash = mmh3.hash_bytes(name.encode('utf-8'))
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg
在Go中,murmur3
返回一个uint64
,所以我们在Python中假设signed=False
;然而,我们也尝试过signed=True
,但得不到匹配的哈希值。
我们可以尝试使用不同的库,但我们想知道我们在计算字符串的base64编码哈希时,Go和Python的方法是否存在问题。感谢任何帮助。
英文:
We have two different libraries, one in Python and one in Go that need to compute murmur3 hashes identically. Unfortunately no matter how hard we try, we cannot get the libraries to produce the same result. It appears from this SO question about Java and Python that compatibility isn't necessarily straight forward.
Right now we're using the python mmh3 and Go github.com/spaolacci/murmur3 libraries.
In Go:
hash := murmur3.New128()
hash.Write([]byte("chocolate-covered-espresso-beans"))
fmt.Println(base64.RawURLEncoding.EncodeToString(hash.Sum(nil)))
// Output: cLHSo2nCBxyOezviLM5gwg
In Python:
name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: jns74izOYMJwsdKjacIHHA (big byteorder)
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='little', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg (little byteorder)
hash = mmh3.hash_bytes(name.encode('utf-8'))
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg
In Go, murmur3
returns a uint64
so we assume signed=False
in Python; however we also tried signed=True
and did not get matching hashes.
We're open to different libraries, but are wondering if there is something wrong with either our Go or Python methodologies of computing a base64 encoded hash from a string. Any help appreciated.
答案1
得分: 3
第一个Python结果几乎正确。
>>> binascii.hexlify(base64.b64decode('jns74izOYMJwsdKjacIHHA=='))
b'8e7b3be22cce60c270b1d2a369c2071c'
在Go中:
x, y := murmur3.Sum128([]byte("chocolate-covered-espresso-beans"))
fmt.Printf("%x %x\n", x, y)
结果为:
70b1d2a369c2071c 8e7b3be22cce60c2
所以两个单词的顺序被颠倒了。要在Python中获得相同的结果,你可以尝试以下代码:
name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
hash = hash[8:] + hash[:8]
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# cLHSo2nCBxyOezviLM5gwg
英文:
That first Python result is almost right.
>>> binascii.hexlify(base64.b64decode('jns74izOYMJwsdKjacIHHA=='))
b'8e7b3be22cce60c270b1d2a369c2071c'
In Go:
x, y := murmur3.Sum128([]byte("chocolate-covered-espresso-beans"))
fmt.Printf("%x %x\n", x, y)
Results in:
70b1d2a369c2071c 8e7b3be22cce60c2
So the order of the two words is flipped. To get the same result in Python, you can try something like:
name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
hash = hash[8:] + hash[:8]
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# cLHSo2nCBxyOezviLM5gwg
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论