使用Gob将日志以追加方式写入文件。

huangapple go评论83阅读模式
英文:

Use Gob to write logs to a file in an append style

问题

使用Gob编码将结构体连续追加到同一个文件中,通过使用append函数,是否可能实现?这在写入时是可行的,但是当使用解码器多次读取时,会遇到"extra data in buffer"的错误。所以我想知道是否首先可能实现这个功能,或者是否应该使用类似JSON的方式,按行追加JSON文档。因为另一种选择是序列化一个切片,但是再次将其作为整体读取将会失去使用append的目的。

英文:

Would it be possible to use Gob encoding for appending structs in series to the same file using append? It works for writing, but when reading with the decoder more than once I run into:

extra data in buffer

So I wonder if that's possible in the first place or whether I should use something like JSON to append JSON documents on a per line basis instead. Because the alternative would be to serialize a slice, but then again reading it as a whole would defeat the purpose of append.

答案1

得分: 6

gob 包并不是设计成这样使用的。一个 gob 流必须由一个单独的 gob.Encoder 写入,并且也必须由一个单独的 gob.Decoder 读取。

这是因为 gob 包不仅序列化您传递给它的值,还传输描述其类型的数据:

> 一串 gob 是自描述的。流中的每个数据项之前都有一个指定其类型的规范,以一组预定义类型表示。

这是编码器/解码器的状态——关于类型及其如何传输的信息——后续的新编码器/解码器将无法(也不可能)分析“前面”的流以重建相同的状态并继续之前的编码器/解码器的工作。

当然,如果您创建一个单独的 gob.Encoder,您可以使用它来序列化任意数量的值。

您还可以创建一个 gob.Encoder 并将其写入文件,然后稍后创建一个新的 gob.Encoder,并追加到同一个文件中,但是您必须使用 2 个 gob.Decoder 来读取这些值,完全匹配编码过程。

作为演示,让我们跟随一个示例。此示例将写入一个内存缓冲区(bytes.Buffer)。2 个后续的编码器将向其写入,然后我们将使用 2 个后续的解码器来读取这些值。我们将写入此结构的值:

type Point struct {
	X, Y int
}

为了简洁的代码,我使用了这个“错误处理”函数:

func he(err error) {
	if err != nil {
		panic(err)
	}
}

现在是代码:

const n, m = 3, 2
buf := &bytes.Buffer{}

e := gob.NewEncoder(buf)
for i := 0; i < n; i++ {
	he(e.Encode(&Point{X: i, Y: i * 2}))
}

e = gob.NewEncoder(buf)
for i := 0; i < m; i++ {
	he(e.Encode(&Point{X: i, Y: 10 + i}))
}

d := gob.NewDecoder(buf)
for i := 0; i < n; i++ {
	var p *Point
	he(d.Decode(&p))
	fmt.Println(p)
}

d = gob.NewDecoder(buf)
for i := 0; i < m; i++ {
	var p *Point
	he(d.Decode(&p))
	fmt.Println(p)
}

输出结果(在 Go Playground 上尝试):

&{0 0}
&{1 2}
&{2 4}
&{0 10}
&{1 11}

请注意,如果我们只使用一个解码器来读取所有的值(循环直到 i < n + m),当迭代达到 n + 1 时,我们将得到您在问题中发布的相同错误消息,因为后续的数据不是序列化的 Point,而是一个新的 gob 流的开始。

因此,如果您想继续使用 gob 包来实现您想要的功能,您必须稍微修改、增强您的编码/解码过程。您必须以某种方式“标记边界”,当使用新的编码器时(因此在解码时,您将知道您必须创建一个新的解码器来读取后续的值)。

您可以使用不同的技术来实现这一点:

  • 您可以在继续写入值之前写出一个数字,一个计数,这个数字将告诉您使用当前编码器写入了多少个值。
  • 如果您不想或无法告诉当前编码器将写入多少个值,您可以选择在不再使用当前编码器写入更多值时写出一个特殊的“编码器结束”值。在解码时,如果遇到这个特殊的“编码器结束”值,您将知道您必须创建一个新的解码器以便能够读取更多的值。

这里需要注意的一些事项:

  • 如果只使用一个编码器,gob 包是最高效、最紧凑的,因为每次创建和使用新的编码器时,类型规范都必须重新传输,导致更多的开销,并使编码/解码过程变慢。
  • 您不能在数据流中寻找,只有在从开头读取整个文件直到您想要的值时才能解码任何值。请注意,即使使用其他格式(如 JSON 或 XML),这在某种程度上也适用。

如果您想要寻找功能,您需要单独管理一个索引文件,该文件将告诉您新的编码器/解码器开始的位置,这样您就可以寻找到该位置,创建一个新的解码器,并从那里开始读取值。

警告

gob.NewDecoder() 文档中提到:

> 如果 r 也没有实现 io.ByteReader,它将被包装在一个 bufio.Reader 中。

这意味着如果您例如使用 os.File(它没有实现 io.ByteReader),内部使用的 bufio.Reader 可能会从传递的读取器中读取比 gob.Decoder 实际使用的数据更多的数据(正如其名称所示,它执行缓冲 IO)。因此,在同一输入读取器上使用多个解码器可能会导致解码错误,因为先前解码器的内部使用的 bufio.Reader 可能会读取不会被使用和传递给下一个解码器的数据。

解决方法/解决此问题的方法是显式地传递一个实现了不会“提前”读取缓冲区的 io.ByteReader 的读取器。例如:

type byteReader struct {
	io.Reader
	buf []byte
}

func (br byteReader) ReadByte() (byte, error) {
	if _, err := io.ReadFull(br, br.buf); err != nil {
		return 0, err
	}
	return br.buf[0], nil
}

func newByteReader(r io.Reader) byteReader {
	return byteReader{r, make([]byte, 1)}
}

查看一个没有此包装器的错误示例:https://go.dev/play/p/dp1a4dMDmNc

以及查看上述包装器如何修复该问题:https://go.dev/play/p/iw528FTFxmU

请参阅相关问题:https://stackoverflow.com/questions/37618399/efficient-go-serialization-of-struct-to-disk/37620399#37620399

英文:

The gob package wasn't designed to be used this way. A gob stream has to be written by a single gob.Encoder, and it also has to be read by a single gob.Decoder.

The reason for this is because the gob package not only serializes the values you pass to it, it also transmits data to describe their types:

> A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

This is a state of the encoder / decoder–about what types and how they have been transmitted–, a subsequent new encoder / decoder will not (cannot) analyze the "preceeding" stream to reconstruct the same state and continue where a previous encoder / decoder left off.

Of course if you create a single gob.Encoder, you may use it to serialize as many values as you'd like to.

Also you can create a gob.Encoder and write to a file, and then later create a new gob.Encoder, and append to the same file, but you must use 2 gob.Decoders to read those values, exactly matching the encoding process.

As a demonstration, let's follow an example. This example will write to an in-memory buffer (bytes.Buffer). 2 subsequent encoders will write to it, then we will use 2 subsequent decoders to read the values. We'll write values of this struct:

type Point struct {
	X, Y int
}

For short, compact code, I use this "error handler" function:

func he(err error) {
	if err != nil {
		panic(err)
	}
}

And now the code:

const n, m = 3, 2
buf := &bytes.Buffer{}

e := gob.NewEncoder(buf)
for i := 0; i < n; i++ {
	he(e.Encode(&Point{X: i, Y: i * 2}))
}

e = gob.NewEncoder(buf)
for i := 0; i < m; i++ {
	he(e.Encode(&Point{X: i, Y: 10 + i}))
}

d := gob.NewDecoder(buf)
for i := 0; i < n; i++ {
	var p *Point
	he(d.Decode(&p))
	fmt.Println(p)
}

d = gob.NewDecoder(buf)
for i := 0; i < m; i++ {
	var p *Point
	he(d.Decode(&p))
	fmt.Println(p)
}

Output (try it on the Go Playground):

&{0 0}
&{1 2}
&{2 4}
&{0 10}
&{1 11}

Note that if we'd use only 1 decoder to read all the values (looping until i < n + m, we'd get the same error message you posted in your question when the iteration reaches n + 1, because the subsequent data is not a serialized Point, but the start of a new gob stream.

So if you want to stick with the gob package for doing what you want to do, you have to slightly modify, enhance your encoding / decoding process. You have to somehow mark the boundaries when a new encoder is used (so when decoding, you'll know you have to create a new decoder to read subsequent values).

You may use different techniques to achieve this:

  • You may write out a number, a count before you proceed to write values, and this number would tell how many values were written using the current encoder.
  • If you don't want to or can't tell how many values will be written with the current encoder, you may opt to write out a special end-of-encoder value when you don't write more values with the current encoder. When decoding, if you encounter this special end-of-encoder value, you'll know you have to create a new decoder to be able to read more values.

Some things to note here:

  • The gob package is most efficient, most compact if only a single encoder is used, because each time you create and use a new encoder, the type specifications will have to be re-transmitted, causing more overhead, and making the encoding / decoding process slower.
  • You can't seek in the data stream, you can only decode any value if you read the whole file from the beginning up until the value you want. Note that this somewhat applies even if you use other formats (such as JSON or XML).

If you want seeking functionality, you'd need to manage an index file separately, which would tell at which positions new encoders / decoders start, so you could seek to that position, create a new decoder, and start reading values from there.

Warning

gob.NewDecoder() documents that:

> If r does not also implement io.ByteReader, it will be wrapped in a bufio.Reader.

This means that if you use os.File for example (it does not implement io.ByteReader), the internally used bufio.Reader might read more data from the passed reader than what gob.Decoder actually uses (as its name says, it does buffered IO). So using multiple decoders on the same input reader might result in decoding errors, as the internally used bufio.Reader of a previous decoder might read data that will not be used and passed on to the next decoder.

A solution / workaround to this is to explicitly pass a reader that implements io.ByteReader that does not read a buffer "ahead". For example:

type byteReader struct {
	io.Reader
	buf []byte
}

func (br byteReader) ReadByte() (byte, error) {
	if _, err := io.ReadFull(br, br.buf); err != nil {
		return 0, err
	}
	return br.buf[0], nil
}

func newByteReader(r io.Reader) byteReader {
	return byteReader{r, make([]byte, 1)}
}

See a faulty example without this wrapper: https://go.dev/play/p/dp1a4dMDmNc

And see how the above wrapper fixes the problem: https://go.dev/play/p/iw528FTFxmU

Check a related question: https://stackoverflow.com/questions/37618399/efficient-go-serialization-of-struct-to-disk/37620399#37620399

答案2

得分: 1

除了上述内容之外,我建议使用一个中间结构来排除gob头部:

package main

import (
	"bytes"
	"encoding/gob"
	"fmt"
	"io"
	"log"
)

type Point struct {
	X, Y int
}

func main() {
	buf := new(bytes.Buffer)
	enc, _, err := NewEncoderWithoutHeader(buf, new(Point))
	if err != nil {
		log.Fatal(err)
	}
	enc.Encode(&Point{10, 10})
	fmt.Println(buf.Bytes())
}


type HeaderSkiper struct {
	src io.Reader
	dst io.Writer
}

func (hs *HeaderSkiper) Read(p []byte) (int, error) {
	return hs.src.Read(p)
}

func (hs *HeaderSkiper) Write(p []byte) (int, error) {
	return hs.dst.Write(p)
}

func NewEncoderWithoutHeader(w io.Writer, sample interface{}) (*gob.Encoder, *bytes.Buffer, error) {
	hs := new(HeaderSkiper)
	hdr := new(bytes.Buffer)
	hs.dst = hdr

	enc := gob.NewEncoder(hs)
	// Write sample with header info
	if err := enc.Encode(sample); err != nil {
		return nil, nil, err
	}
	// Change writer
	hs.dst = w
	return enc, hdr, nil
}

func NewDecoderWithoutHeader(r io.Reader, hdr *bytes.Buffer, dummy interface{}) (*gob.Decoder, error) {
	hs := new(HeaderSkiper)
	hs.src = hdr

	dec := gob.NewDecoder(hs)
	if err := dec.Decode(dummy); err != nil {
		return nil, err
	}

	hs.src = r
	return dec, nil
}

希望对你有帮助!

英文:

In addition to the above, I suggest using an intermediate structure to exclude the gob header:

package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"log"
)
type Point struct {
X, Y int
}
func main() {
buf := new(bytes.Buffer)
enc, _, err := NewEncoderWithoutHeader(buf, new(Point))
if err != nil {
log.Fatal(err)
}
enc.Encode(&Point{10, 10})
fmt.Println(buf.Bytes())
}
type HeaderSkiper struct {
src io.Reader
dst io.Writer
}
func (hs *HeaderSkiper) Read(p []byte) (int, error) {
return hs.src.Read(p)
}
func (hs *HeaderSkiper) Write(p []byte) (int, error) {
return hs.dst.Write(p)
}
func NewEncoderWithoutHeader(w io.Writer, sample interface{}) (*gob.Encoder, *bytes.Buffer, error) {
hs := new(HeaderSkiper)
hdr := new(bytes.Buffer)
hs.dst = hdr
enc := gob.NewEncoder(hs)
// Write sample with header info
if err := enc.Encode(sample); err != nil {
return nil, nil, err
}
// Change writer
hs.dst = w
return enc, hdr, nil
}
func NewDecoderWithoutHeader(r io.Reader, hdr *bytes.Buffer, dummy interface{}) (*gob.Decoder, error) {
hs := new(HeaderSkiper)
hs.src = hdr
dec := gob.NewDecoder(hs)
if err := dec.Decode(dummy); err != nil {
return nil, err
}
hs.src = r
return dec, nil
}

答案3

得分: 0

另外,除了icza的回答之外,你还可以使用以下技巧来向已经写入数据的gob文件追加内容:在第一次追加时,先写入并丢弃第一个编码:

  1. 像往常一样创建Encode gob文件(第一个编码写入头部)
  2. 关闭文件
  3. 追加模式打开文件
  4. 使用中间写入器编码虚拟结构(写入头部)
  5. 重置写入器
  6. 像往常一样编码gob(不写入头部)

示例代码如下:

package main

import (
	"bytes"
	"encoding/gob"
	"fmt"
	"io"
	"io/ioutil"
	"log"
	"os"
)

type Record struct {
	ID   int
	Body string
}

func main() {
	r1 := Record{ID: 1, Body: "abc"}
	r2 := Record{ID: 2, Body: "def"}

	// 编码 r1
	var buf1 bytes.Buffer
	enc := gob.NewEncoder(&buf1)
	err := enc.Encode(r1)
	if err != nil {
		log.Fatal(err)
	}

	// 写入文件
	err = ioutil.WriteFile("/tmp/log.gob", buf1.Bytes(), 0600)
	if err != nil {
		log.Fatal()
	}

	// 编码虚拟结构(写入头部)
	var buf2 bytes.Buffer
	enc = gob.NewEncoder(&buf2)
	err = enc.Encode(Record{})
	if err != nil {
		log.Fatal(err)
	}

	// 重置虚拟结构
	buf2.Reset()

	// 编码 r2
	err = enc.Encode(r2)
	if err != nil {
		log.Fatal(err)
	}

	// 打开文件
	f, err := os.OpenFile("/tmp/log.gob", os.O_WRONLY|os.O_APPEND, 0600)
	if err != nil {
		log.Fatal(err)
	}

	// 写入 r2
	_, err = f.Write(buf2.Bytes())
	if err != nil {
		log.Fatal(err)
	}

	// 解码文件
	data, err := ioutil.ReadFile("/tmp/log.gob")
	if err != nil {
		log.Fatal(err)
	}

	var r Record
	dec := gob.NewDecoder(bytes.NewReader(data))
	for {
		err = dec.Decode(&r)
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
		fmt.Println(r)
	}
}

希望对你有帮助!

英文:

Additionally to great icza answer, you could use the following trick to append to a gob file with already written data: when append the first time write and discard the first encode:

  1. Create the file Encode gob as usual (first encode write headers)
  2. Close file
  3. Open file for append
  4. Using and intermediate writer encode dummy struct (which write headers)
  5. Reset the writer
  6. Encode gob as usual (writes no headers)

Example:

package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"io/ioutil"
"log"
"os"
)
type Record struct {
ID   int
Body string
}
func main() {
r1 := Record{ID: 1, Body: "abc"}
r2 := Record{ID: 2, Body: "def"}
// encode r1
var buf1 bytes.Buffer
enc := gob.NewEncoder(&buf1)
err := enc.Encode(r1)
if err != nil {
log.Fatal(err)
}
// write to file
err = ioutil.WriteFile("/tmp/log.gob", buf1.Bytes(), 0600)
if err != nil {
log.Fatal()
}
// encode dummy (which write headers)
var buf2 bytes.Buffer
enc = gob.NewEncoder(&buf2)
err = enc.Encode(Record{})
if err != nil {
log.Fatal(err)
}
// remove dummy
buf2.Reset()
// encode r2
err = enc.Encode(r2)
if err != nil {
log.Fatal(err)
}
// open file
f, err := os.OpenFile("/tmp/log.gob", os.O_WRONLY|os.O_APPEND, 0600)
if err != nil {
log.Fatal(err)
}
// write r2
_, err = f.Write(buf2.Bytes())
if err != nil {
log.Fatal(err)
}
// decode file
data, err := ioutil.ReadFile("/tmp/log.gob")
if err != nil {
log.Fatal(err)
}
var r Record
dec := gob.NewDecoder(bytes.NewReader(data))
for {
err = dec.Decode(&r)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Println(r)
}
}

huangapple
  • 本文由 发表于 2017年4月4日 13:44:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/43199137.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定