英文:
Implementing a Merkle-tree data structure in Go
问题
我目前正在尝试在Go中实现Merkle树数据结构。基本上,我的最终目标是存储一小组结构化数据(最大10MB),并允许这个“数据库”与分布在网络上的其他节点轻松同步(参见相关文档)。
我已经在Node中相当有效地实现了这一点,因为它没有类型检查。这就是Go的问题所在,我想利用Go的编译时类型检查,但我也希望有一个可以与任何提供的树一起工作的库。
简而言之,我想将结构体用作Merkle节点,并且我想在所有类型中嵌入一个Merkle.Update()
方法。我试图避免为每个结构体编写一个Update()
方法(尽管我知道这可能是唯一/最好的方法)。
我的想法是使用嵌入类型:
//库
type Merkle struct {
Initialised bool
Container interface{} //在示例中,这引用了foo
Fields []reflect.Type
//... 其他Merkle状态
}
//Merkle方法... Update()...等等...
//用户代码
type Foo struct {
Merkle
A int
B bool
C string
D map[string]*Bazz
E []*Bar
}
type Bazz struct {
Merkle
S int
T int
U int
}
type Bar struct {
Merkle
X int
Y int
Z int
}
在这个示例中,Foo
将是根节点,它将包含Bazz
和Bar
。这种关系可以通过反射类型来推断出来。问题在于使用方式:
foo := &Foo{
A: 42,
B: true,
C: "foo",
D: map[string]*Bazz{
"b1": &Bazz{},
"b2": &Bazz{},
},
E: []*Bar{
&Bar{},
&Bar{},
&Bar{},
},
}
merkle.Init(foo)
foo.Hash //初始哈希值 => abc...
foo.A = 35
foo.E = append(foo.E, &Bar{})
foo.Update()
foo.Hash //更新后的哈希值 => def...
我认为我们需要merkle.Init(foo)
,因为foo.Init()
实际上是foo.Merkle.Init()
,无法反射foo
。未初始化的Bar
和Bazz
可以通过父节点foo.Update()
来检测和初始化。一些反射是可以接受的,因为正确性比性能更重要。
另一个问题是,当我们Update()
一个节点时,所有结构字段(子节点)也需要进行Update()
(重新哈希),因为我们不确定有什么改变。我们可以使用foo.SetInt("A", 35)
来实现自动更新,但这样我们就失去了编译时的类型检查。
这是否被认为是Go的惯用方式?如果不是,如何改进?有人能想到一种在内存中存储数据集(用于快速读取)并进行简洁的数据集比较(用于高效的网络增量传输)的替代方法吗?
编辑:还有一个元问题:在哪里提出这种问题最好,StackOverflow、Reddit还是go-nuts?最初在reddit上发布,没有得到答案:(
英文:
I'm currently attempting to implement a merkle-tree data structure in Go. Basically, my end goal is to store a small set of structured data (10MB max) and allow this "database" to be easily synchronised with other nodes distributed over the network (see related ).
I've implemented this reasonably effectively in Node as there are no type-checks. Herein lies the problem with Go, I'd like to make use of Go's compile-time type checks, though I also want to have one library which works with any provided tree.
In short, I'd like to use structs as merkle nodes and I'd like to have one Merkle.Update()
method which is embedded in all types. I'm trying to avoid writing an Update()
for every struct (though I'm aware this might be the only/best way).
My idea was to use embedded types:
//library
type Merkle struct {
Initialised bool
Container interface{} //in example this references foo
Fields []reflect.Type
//... other merkle state
}
//Merkle methods... Update()... etc...
//userland
type Foo struct {
Merkle
A int
B bool
C string
D map[string]*Bazz
E []*Bar
}
type Bazz struct {
Merkle
S int
T int
U int
}
type Bar struct {
Merkle
X int
Y int
Z int
}
In this example, Foo
will be the root, which will contain Bazz
s and Bar
s. This relationship could be inferred by reflecting on the types. The problem is the usage:
foo := &Foo{
A: 42,
B: true,
C: "foo",
D: map[string]*Bazz{
"b1": &Bazz{},
"b2": &Bazz{},
},
E: []*Bar{
&Bar{},
&Bar{},
&Bar{},
},
}
merkle.Init(foo)
foo.Hash //Initial hash => abc...
foo.A = 35
foo.E = append(foo.E, &Bar{})
foo.Update()
foo.Hash //Updated hash => def...
I think we need to merkle.Init(foo)
since foo.Init()
would actually be foo.Merkle.Init()
and would not be able to reflect on foo
. The uninitialised Bar
s and Bazz
s could be detected and initialised by the parent foo.Update()
. Some reflection is acceptable as correctness is more important than performance at the moment.
Another problem is, when we Update()
a node, all struct fields (child nodes) would need to be Update()
d as well (rehashed) since we aren't sure what was changed. We could do foo.SetInt("A", 35)
to implement an auto-update, though then we lose compile time type-checks.
Would this be considered idiomatic Go? If not, how could this be improved? Can anyone think of an alternative way to store a dataset in memory (for fast reads) with concise dataset comparison (for efficient delta transfers over the network)?
Edit: And also a meta-question: Where is the best place to ask this kind of question, StackOverflow, Reddit or go-nuts? Originally posted on reddit with no answer
答案1
得分: 4
一些目标看起来像是:
- 哈希任何东西 -- 使其易于通过哈希大量的东西
- 缓存哈希 -- 使更新只重新哈希所需的内容
- 符合惯用法 -- 在其他 Go 代码中很好地适应
我认为你可以以类似内置的 encoding/gob
或 encoding/json
的序列化工具的方式来处理哈希任何东西,这是一个三步走的过程:如果类型实现了特殊方法(对于 JSON 来说是 MarshalJSON
),则使用该方法;对于基本类型,使用类型切换;否则,使用反射的默认情况。下面是一个提供了哈希缓存辅助函数并允许类型实现 Hash
或不实现的 API 草图:
package merkle
type HashVal uint64
const MissingHash HashVal = 0
// Hasher 为类型提供自定义的哈希实现。不是所有类型都需要实现它,但这样做可以加快更新速度。
type Hasher interface {
Hash() HashVal
}
// HashCacher 是缓存哈希值的项目接口。通常通过嵌入 HashCache 来实现。
type HashCacher interface {
CachedHash() *HashVal
}
// HashCache 实现了 HashCacher;它旨在嵌入到你的结构体中,以使哈希树的更新更高效。
type HashCache struct {
h HashVal
}
// CachedHash 实现了 HashCacher。
func (h *HashCache) CachedHash() *HashVal {
return &h.h
}
// Hash 返回某个对象的哈希值,使用缓存的哈希值或 Hash() 方法(如果有)。
func Hash(i interface{}) HashVal {
if hashCacher, ok := i.(HashCacher); ok {
if cached := *hashCacher.CachedHash(); cached != MissingHash {
return cached
}
}
switch i := i.(type) {
case Hasher:
return i.Hash()
case uint64:
return HashVal(i * 8675309) // 或者,你知道的,使用真正的哈希函数
case []byte:
// 对字节进行 CRC 校验,例如
return 0xdeadbeef
default:
return 0xdeadbeef
// 可怕的慢速递归情况,使用反射
// 例如:使用反射迭代字段,然后对每个字段进行哈希
}
// 在这里抛出 panic(),你也可以稍微冒险一点,声明对于不可哈希类型的更改不会使树无效
panic("传递给 Hash() 的类型无法哈希")
}
// Item 是 Merkle 树中的一个节点,它必须知道如何找到其父节点(根节点应返回 nil),并且通常应嵌入 HashCache 以实现高效的更新。为了避免使用反射,Item 可能也会受益于成为 Hasher。
type Item interface {
Parent() Item
HashCacher
}
// Update 更新 i 和根节点之间的项目链,给定可能已更改的叶节点。
func Update(i Item) {
for i != nil {
cached := i.CachedHash()
*cached = MissingHash // 使其无效
*cached = Hash(i)
i = i.Parent()
}
}
希望对你有所帮助!
英文:
Some goals seem like:
- Hash anything -- make it easy to use by hashing lots of things out of the box
- Cache hashes -- make updates just rehash what they need to
- Be idiomatic -- fit in well among other Go code
I think you can attack hashing anything roughly the way that serialization tools like the built-in encoding/gob
or encoding/json
do, which is three-pronged: use a special method if the type implements it (for JSON that's MarshalJSON
), use a type switch for basic types, and fall back to a nasty default case using reflection. Here's an API sketch that provides a helper for hash caching and lets types either implement Hash
or not:
package merkle
type HashVal uint64
const MissingHash HashVal = 0
// Hasher provides a custom hash implementation for a type. Not
// everything needs to implement it, but doing so can speed
// updates.
type Hasher interface {
Hash() HashVal
}
// HashCacher is the interface for items that cache a hash value.
// Normally implemented by embedding HashCache.
type HashCacher interface {
CachedHash() *HashVal
}
// HashCache implements HashCacher; it's meant to be embedded in your
// structs to make updating hash trees more efficient.
type HashCache struct {
h HashVal
}
// CachedHash implements HashCacher.
func (h *HashCache) CachedHash() *HashVal {
return &h.h
}
// Hash returns something's hash, using a cached hash or Hash() method if
// available.
func Hash(i interface{}) HashVal {
if hashCacher, ok := i.(HashCacher); ok {
if cached := *hashCacher.CachedHash(); cached != MissingHash {
return cached
}
}
switch i := i.(type) {
case Hasher:
return i.Hash()
case uint64:
return HashVal(i * 8675309) // or, you know, use a real hash
case []byte:
// CRC the bytes, say
return 0xdeadbeef
default:
return 0xdeadbeef
// terrible slow recursive case using reflection
// like: iterate fields using reflect, then hash each
}
// instead of panic()ing here, you could live a little
// dangerously and declare that changes to unhashable
// types don't invalidate the tree
panic("unhashable type passed to Hash()")
}
// Item is a node in the Merkle tree, which must know how to find its
// parent Item (the root node should return nil) and should usually
// embed HashCache for efficient updates. To avoid using reflection,
// Items might benefit from being Hashers as well.
type Item interface {
Parent() Item
HashCacher
}
// Update updates the chain of items between i and the root, given the
// leaf node that may have been changed.
func Update(i Item) {
for i != nil {
cached := i.CachedHash()
*cached = MissingHash // invalidate
*cached = Hash(i)
i = i.Parent()
}
}
答案2
得分: 1
Go语言的继承方式与其他语言不同。
“父类”无法修改子类中的项目,你需要在每个结构体上实现Update
方法,然后在其中进行操作,再调用父类的Update
方法。
func (b *Bar) Update() {
b.Merkle.Update()
//处理与b和b.Merkle相关的事务
//其他操作
}
func (f *Foo) Update() {
f.Merkle.Update()
for _, b := range f.E {
b.Update()
}
//其他操作
}
我认为你需要以不同的方式重新实现你的树结构。
另外,请在下次提问时提供一个可测试的案例。
英文:
Go doesn't have inheritance in the same way other languages do.
The "parent" can't modify items in the child, you'd have to implement Update
on each struct then do your business in it then have it call the parent's Update
.
func (b *Bar) Update() {
b.Merkle.Update()
//do stuff related to b and b.Merkle
//stuff
}
func (f *Foo) Update() {
f.Merkle.Update()
for _, b := range f.E {
b.Update()
}
//etc
}
I think you will have to re-implement your tree in a different way.
Also please provide a testable case the next time.
答案3
得分: 1
你看过https://github.com/xsleonard/go-merkle吗?它可以让你创建一个二进制默克尔树。你可以在数据的末尾添加一个类型字节来标识它。
英文:
Have you seen https://github.com/xsleonard/go-merkle which will allow you to create a binary merkle tree. You could append a type byte to the end of your data to identify it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论