两个高斯编码器产生不同的结果

huangapple go评论89阅读模式
英文:

Two gob encoders produce different results

问题

...而且我试图弄清楚我做错了什么!

示例看起来有些牵强,但它是从我遇到错误的代码中提取出来的。在我的代码中,我正在对字节缓冲进行哈希处理,并希望该过程是可预测的。

package main

import (
	"bytes"
	"encoding/gob"
	"fmt"
	"log"
)

type Foo struct {
	Bar string
	Baz string
}

func (f *Foo) X() string {
	var b bytes.Buffer
	s := struct {
		Bar string
		Baz string
	}{
		f.Bar,
		f.Baz,
	}
	log.Printf("%v", s)
	gob.NewEncoder(&b).Encode(s)
	return fmt.Sprintf("%x", b)
}

func (f *Foo) Y(x string) string {
	var b bytes.Buffer
	s := struct {
		Bar string
		Baz string
		S   string
	}{
		f.Bar,
		f.Baz,
		x,
	}
	log.Printf("%v", s)
	gob.NewEncoder(&b).Encode(s)
	return fmt.Sprintf("%x", b)
}

func main() {
	a := &Foo{
		Bar: "bar",
		Baz: "baz",
	}

	log.Println(a.X())
	log.Println(a.Y("something"))
}

运行结果如下:

{bar baz}
{1cff81030102ff820001020103426172010c00010342617a010c0000000dff820103626172010362617a00 0 0}
{bar baz something}
{22ff83030102ff840001030103426172010c00010342617a010c00010153010c00000018ff840103626172010362617a0109736f6d657468696e6700 0 0}

注释掉 log.Println(a.X()) 后的结果如下:

{bar baz something}
{22ff81030102ff820001030103426172010c00010342617a010c00010153010c00000018ff820103626172010362617a0109736f6d657468696e6700 0 0}

我期望这两个编码是相同的,但它们在我认为对应于字段边界的位置上有所不同(可以预测):

22
ff83 # 81
030102

ff84 # 82
0001030103426172010c00010342617a010c00010153010c00000018

ff84 # 82
0103626172010362617a0109736f6d657468696e6700

尽管细节不同,但这种行为与我的代码一致。

我在每个方法中都创建了一个新的 bytes.Buffergob.NewEncoder,所以不清楚为什么调用 X 会改变 Y 的结果。

英文:

... and it's driving me nuts trying to understand what I'm doing wrong!

Playground: https://go.dev/play/p/ZQP8Y-gwihQ

The example looks contrived but it's drawn from code that I have where the error arose. In my code I'm hashing the bytes buffer and want the process to be predictable.

package main

import (
	"bytes"
	"encoding/gob"
	"fmt"
	"log"
)

type Foo struct {
	Bar string
	Baz string
}

func (f *Foo) X() string {
	var b bytes.Buffer
	s := struct {
		Bar string
		Baz string
	}{
		f.Bar,
		f.Baz,
	}
	log.Printf("%v", s)
	gob.NewEncoder(&b).Encode(s)
	return fmt.Sprintf("%x", b)
}

func (f *Foo) Y(x string) string {
	var b bytes.Buffer
	s := struct {
		Bar string
		Baz string
		S   string
	}{
		f.Bar,
		f.Baz,
		x,
	}
	log.Printf("%v", s)
	gob.NewEncoder(&b).Encode(s)
	return fmt.Sprintf("%x", b)
}

func main() {
	a := &Foo{
		Bar: "bar",
		Baz: "baz",
	}

	log.Println(a.X())
	log.Println(a.Y("something"))
}

Running yields:

{bar baz}
{1cff81030102ff820001020103426172010c00010342617a010c0000000dff820103626172010362617a00 0 0}
{bar baz something}
{22ff83030102ff840001030103426172010c00010342617a010c00010153010c00000018ff840103626172010362617a0109736f6d657468696e6700 0 0}

Commenting out log.Println(a.X()) yields:

{bar baz something}
{22ff81030102ff820001030103426172010c00010342617a010c00010153010c00000018ff820103626172010362617a0109736f6d657468696e6700 0 0}

I expect the two encodings to the same but they differ (predictably) in locations that I assume correspond to field boundaries:

22
ff83 # 81
030102

ff84 # 82
0001030103426172010c00010342617a010c00010153010c00000018

ff84 # 82
0103626172010362617a0109736f6d657468696e6700

Even though the details differ the behavior is consistent with my code.

I'm creating a new bytes.Buffer and gob.NewEncoder in each method and so it's unclear why invoking X changes the result of Y.

答案1

得分: 3

你所忽略的是,由Encoder实例产生的字节流除了每个Encoder实例的状态之外,还有一个全局(整个程序范围内)的状态。这个全局状态包括已注册并发送的类型。

当你发送一个带类型的值时,如果在发送之前该类型尚未注册,它将会被自动注册到全局状态中。这会为该类型分配一个内部数值。可以参考Register(以及它的伴侣函数RegisterName)。当你调用X时,会注册一个匿名结构体类型,该类型包含在X中的s。当你调用Y时,会注册一个匿名结构体类型,该类型包含在Y中的s。它们会获得不同的内部类型编号。如果调用X,那个类型就不会被注册,而Y的类型会被注册在第一个可用的编号下。

> 在我的代码中,我对字节缓冲进行了哈希处理...

出于现在可能显而易见的原因,这不是一个好主意。<s>然而,如果你按照已知的顺序显式注册每个类型,除非将来的某个版本出于(可能是合理的)原因更改了数据传输格式,否则在这里你是安全的。</s>糟糕,测试结果表明这也没有帮助。这是因为即使类型已经注册,它在第一次对该类型的值进行编码之前并没有设置传输编号。所以你需要对每个类型进行编码(然后可以选择将其丢弃)。

这里有一个可运行的示例,它仔细地对两个类型进行了编码并将其丢弃,这样将log.Println(a.X())的调用注释掉对第二个值的编码没有任何影响。

英文:

What you're missing is that the stream of bytes produced by an Encoder instance has global (program-wide) state in addition to the per-Encoder state. That global state consists of [note: edited phrase here] registered-and-sent types.

When you send a typed value, if the type has not yet been registered before being sent, it will be registered for you, in the global state. This assigns an internal numeric value to the type. See Register (and its companion RegisterName). When you call your X, that registers the anonymous struct type that holds s in X. When you call your Y, that registers the anonymous struct type that holds s in Y. These get different internal type numbers. By not calling X, that type is never registered, and Y's type gets registered under the first available number.

> In my code I'm hashing the bytes buffer ...

That's not a great idea for what are now probably obvious reasons. <s>However, if you explicitly register each type in a known order, you'll be safe enough here unless some future version changes the wire format for some (presumably good) reason.</s> Oops, testing this shows it doesn't help either. That's because even if the type is registered, it doesn't have a transmit number set until the first time a value of that type gets encoded. So you need to encode (and optionally then just discard) a value of each type.

Here is a functioning example of carefully discard-encoding the two types, so that commenting out the call to log.Println(a.X()) has no effect on the encoding of the second value.

huangapple
  • 本文由 发表于 2021年12月25日 08:29:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/70477328.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定