英文:
Two gob encoders produce different results
问题
...而且我试图弄清楚我做错了什么!
示例看起来有些牵强,但它是从我遇到错误的代码中提取出来的。在我的代码中,我正在对字节缓冲进行哈希处理,并希望该过程是可预测的。
package main
import (
"bytes"
"encoding/gob"
"fmt"
"log"
)
type Foo struct {
Bar string
Baz string
}
func (f *Foo) X() string {
var b bytes.Buffer
s := struct {
Bar string
Baz string
}{
f.Bar,
f.Baz,
}
log.Printf("%v", s)
gob.NewEncoder(&b).Encode(s)
return fmt.Sprintf("%x", b)
}
func (f *Foo) Y(x string) string {
var b bytes.Buffer
s := struct {
Bar string
Baz string
S string
}{
f.Bar,
f.Baz,
x,
}
log.Printf("%v", s)
gob.NewEncoder(&b).Encode(s)
return fmt.Sprintf("%x", b)
}
func main() {
a := &Foo{
Bar: "bar",
Baz: "baz",
}
log.Println(a.X())
log.Println(a.Y("something"))
}
运行结果如下:
{bar baz}
{1cff81030102ff820001020103426172010c00010342617a010c0000000dff820103626172010362617a00 0 0}
{bar baz something}
{22ff83030102ff840001030103426172010c00010342617a010c00010153010c00000018ff840103626172010362617a0109736f6d657468696e6700 0 0}
注释掉 log.Println(a.X())
后的结果如下:
{bar baz something}
{22ff81030102ff820001030103426172010c00010342617a010c00010153010c00000018ff820103626172010362617a0109736f6d657468696e6700 0 0}
我期望这两个编码是相同的,但它们在我认为对应于字段边界的位置上有所不同(可以预测):
22
ff83 # 81
030102
ff84 # 82
0001030103426172010c00010342617a010c00010153010c00000018
ff84 # 82
0103626172010362617a0109736f6d657468696e6700
尽管细节不同,但这种行为与我的代码一致。
我在每个方法中都创建了一个新的 bytes.Buffer
和 gob.NewEncoder
,所以不清楚为什么调用 X
会改变 Y
的结果。
英文:
... and it's driving me nuts trying to understand what I'm doing wrong!
Playground: https://go.dev/play/p/ZQP8Y-gwihQ
The example looks contrived but it's drawn from code that I have where the error arose. In my code I'm hashing the bytes buffer and want the process to be predictable.
package main
import (
"bytes"
"encoding/gob"
"fmt"
"log"
)
type Foo struct {
Bar string
Baz string
}
func (f *Foo) X() string {
var b bytes.Buffer
s := struct {
Bar string
Baz string
}{
f.Bar,
f.Baz,
}
log.Printf("%v", s)
gob.NewEncoder(&b).Encode(s)
return fmt.Sprintf("%x", b)
}
func (f *Foo) Y(x string) string {
var b bytes.Buffer
s := struct {
Bar string
Baz string
S string
}{
f.Bar,
f.Baz,
x,
}
log.Printf("%v", s)
gob.NewEncoder(&b).Encode(s)
return fmt.Sprintf("%x", b)
}
func main() {
a := &Foo{
Bar: "bar",
Baz: "baz",
}
log.Println(a.X())
log.Println(a.Y("something"))
}
Running yields:
{bar baz}
{1cff81030102ff820001020103426172010c00010342617a010c0000000dff820103626172010362617a00 0 0}
{bar baz something}
{22ff83030102ff840001030103426172010c00010342617a010c00010153010c00000018ff840103626172010362617a0109736f6d657468696e6700 0 0}
Commenting out log.Println(a.X())
yields:
{bar baz something}
{22ff81030102ff820001030103426172010c00010342617a010c00010153010c00000018ff820103626172010362617a0109736f6d657468696e6700 0 0}
I expect the two encodings to the same but they differ (predictably) in locations that I assume correspond to field boundaries:
22
ff83 # 81
030102
ff84 # 82
0001030103426172010c00010342617a010c00010153010c00000018
ff84 # 82
0103626172010362617a0109736f6d657468696e6700
Even though the details differ the behavior is consistent with my code.
I'm creating a new bytes.Buffer
and gob.NewEncoder
in each method and so it's unclear why invoking X
changes the result of Y
.
答案1
得分: 3
你所忽略的是,由Encoder
实例产生的字节流除了每个Encoder
实例的状态之外,还有一个全局(整个程序范围内)的状态。这个全局状态包括已注册并发送的类型。
当你发送一个带类型的值时,如果在发送之前该类型尚未注册,它将会被自动注册到全局状态中。这会为该类型分配一个内部数值。可以参考Register
(以及它的伴侣函数RegisterName
)。当你调用X
时,会注册一个匿名结构体类型,该类型包含在X
中的s
。当你调用Y
时,会注册一个匿名结构体类型,该类型包含在Y
中的s
。它们会获得不同的内部类型编号。如果不调用X
,那个类型就不会被注册,而Y
的类型会被注册在第一个可用的编号下。
> 在我的代码中,我对字节缓冲进行了哈希处理...
出于现在可能显而易见的原因,这不是一个好主意。<s>然而,如果你按照已知的顺序显式注册每个类型,除非将来的某个版本出于(可能是合理的)原因更改了数据传输格式,否则在这里你是安全的。</s>糟糕,测试结果表明这也没有帮助。这是因为即使类型已经注册,它在第一次对该类型的值进行编码之前并没有设置传输编号。所以你需要对每个类型进行编码(然后可以选择将其丢弃)。
这里有一个可运行的示例,它仔细地对两个类型进行了编码并将其丢弃,这样将log.Println(a.X())
的调用注释掉对第二个值的编码没有任何影响。
英文:
What you're missing is that the stream of bytes produced by an Encoder
instance has global (program-wide) state in addition to the per-Encoder
state. That global state consists of [note: edited phrase here] registered-and-sent types.
When you send a typed value, if the type has not yet been registered before being sent, it will be registered for you, in the global state. This assigns an internal numeric value to the type. See Register
(and its companion RegisterName
). When you call your X
, that registers the anonymous struct type that holds s
in X
. When you call your Y
, that registers the anonymous struct type that holds s
in Y
. These get different internal type numbers. By not calling X
, that type is never registered, and Y
's type gets registered under the first available number.
> In my code I'm hashing the bytes buffer ...
That's not a great idea for what are now probably obvious reasons. <s>However, if you explicitly register each type in a known order, you'll be safe enough here unless some future version changes the wire format for some (presumably good) reason.</s> Oops, testing this shows it doesn't help either. That's because even if the type is registered, it doesn't have a transmit number set until the first time a value of that type gets encoded. So you need to encode (and optionally then just discard) a value of each type.
Here is a functioning example of carefully discard-encoding the two types, so that commenting out the call to log.Println(a.X())
has no effect on the encoding of the second value.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论