在Go语言中,可以使用io.Reader以流式方式对结构对象进行编码。

huangapple go评论74阅读模式
英文:

Encode struct object with io.Reader in a streaming manner in go

问题

我有以下的结构体需要编码成JSON并输出到一个io.Writer对象中:

type Data struct {
	FieldA string    `json:"field_a"`
	FieldB int       `json:"field_b"`
	Rows   io.Reader `json:"rows"`
}

Rows是一个io.Reader对象,它将以二进制格式返回一个可能非常大的JSON数组。我希望避免将整个结果从读取器加载到内存中,因为这会引入延迟和高内存开销。Rows保证是有效的JSON,所以不需要对其进行解码然后重新编码成JSON,可以直接传递。

我的问题是,标准库中的json包不支持定义一个流式友好的MarshalJSON实现,它期望你将结果写入一个[]byte缓冲区并返回。

使用json.NewEncoder(writer).Encode(Data)基本上是我需要的,但是我无法为读取器定义自定义行为,它只会返回{}

有没有一种方法可以在不完全自定义json编码过程的情况下实现这一点?

英文:

I have the following struct that I need to encode into JSON and output to an io.Writer object:

type Data struct {
	FieldA string    `json:"field_a"`
	FieldB int       `json:"field_b"`
	Rows   io.Reader `json:"rows"`
}

Rows is an io.Reader object that will return a potentially very large JSON array in binary format. I want to avoid loading the whole result from the reader into memory, as that introduces latency and high memory overhead. Rows is guaranteed to be valid JSON, so it doesn't have to be decoded and then reencoded into JSON, it can just be passed on as is.

My problem is that the json package from the standard library doesn't support defining a streaming friendly implementation of MarshalJSON, it expects you to write your result into a []byte buffer and return it.

Using json.NewEncoder(writer).Encode(Data) is pretty much what I need, but I cannot define custom behavior for the reader and it just returns {} for it.

Is there a way to achieve this without a completely custom implementation of the json encoding process?

答案1

得分: 1

使用标准的encoding/json包无法实现这一点。为Data创建一个自定义的JSON编码器并不困难。

func encode(d *Data, w io.Writer) error {
    field := func(name string, value any) {
        b, _ := json.Marshal(name)
        w.Write(b)
        io.WriteString(w, ": ")
        b, _ = json.Marshal(value)
        w.Write(b)
        io.WriteString(w, ", ")
    }

    io.WriteString(w, "{ ")
    field("field_a", d.FieldA)
    field("field_b", d.FieldB)
    io.WriteString(w, "\"rows\": ")
    _, err := io.Copy(w, d.Rows)
    if err != nil {
        return err
    }
    io.WriteString(w, "}\n")
    return nil
}

以上是一个自定义的JSON编码器示例,用于将Data类型编码为JSON格式并写入io.Writer

英文:

There is not a way to achieve this using the standard encoding/json package. A custom JSON encoder for Data is not onerous.

func encode(d *Data, w io.Writer) error {
	field := func(name string, value any) {
		b, _ := json.Marshal(name)
		w.Write(b)
		io.WriteString(w, ": ")
		b, _ = json.Marshal(value)
		w.Write(b)
		io.WriteString(w, ", ")
	}

	io.WriteString(w, "{ ")
	field("field_a", d.FieldA)
	field("field_b", d.FieldB)
	io.WriteString(w, `"rows": `)
	_, err := io.Copy(w, d.Rows)
	if err != nil {
		return err
	}
	io.WriteString(w, "}\n")
	return nil
}

答案2

得分: 0

我采用了与其他答案类似的解决方案,但不需要手动序列化每个其他字段。

我定义了如下的数据结构:

type Data struct {
    FieldA string    `json:"field_a"`
    FieldB int       `json:"field_b"`
    Rows   io.Reader `json:"-"`
}

Rows字段的json标签中的-表示go应该始终跳过将该字段编码为json输出。然后,编码函数如下所示:

func encode(d Data, writer io.Writer) error {
    buf, err := json.Marshal(d)
    if err != nil {
        return err
    }

    _, err = writer.Write(buf[:(len(buf) - 1)])
    if err != nil {
        return err
    }
    _, err = writer.Write([]byte(",\"rows\":"))
    if err != nil {
        return err
    }
    _, err = io.Copy(writer, req.Rows)
    if err != nil {
        return err
    }
    _, err = writer.Write([]byte("}"))
    if err != nil {
        return err
    }

    return nil
}
  1. 我将对象原样序列化到缓冲区中(但不包括rows字段)
  2. 将除了最后一个闭合大括号之外的内容写入输出
  3. ,"rows":写入输出以指示一个新字段
  4. 将rows读取器复制到输出写入器中
  5. 在输出中写入最后一个}

这个方法效果很好,使用gin web服务器的端到端PoC大约使用35MB的内存,从S3获取一个表示行的对象的读取器,使用zstd解压缩,并直接将其序列化到gin输出写入器中。甚至比在内存中完成整个过程要快得多,因为它可以立即开始返回数据,而不必等待整个过程被解码到内存中,然后重新编码。

如果有人感兴趣,完整的PoC可以在这里找到:
https://github.com/TheEdgeOfRage/streaming-poc

英文:

I settled on a solution similar to the other answer, but which doesn't require manually serializing every other field.

I defined data like this:

type Data struct {
    FieldA string    `json:"field_a"`
    FieldB int       `json:"field_b"`
    Rows   io.Reader `json:"-"`
}

The - in the json tag for the Rows field indicates that go should always skip encoding that field into the json output. The encode function then looks like this:

func encode(d Data, writer io.Writer) error {
	buf, err := json.Marshal(d)
	if err != nil {
		return err
	}

	_, err = writer.Write(buf[:(len(buf) - 1)])
	if err != nil {
		return err
	}
	_, err = writer.Write([]byte(",\"rows\":"))
	if err != nil {
		return err
	}
	_, err = io.Copy(writer, req.Rows)
	if err != nil {
		return err
	}
	_, err = writer.Write([]byte("}"))
	if err != nil {
		return err
	}

	return nil
}
  1. I serialize the object as-is into a buffer (but without the rows)
  2. Write to the output everything except the final closing brace
  3. Write to the output ,"rows": to indicate a new field
  4. Copy the rows reader to the output writer
  5. Write a final } to the output

It works pretty well and an end-to-end PoC with a gin web server uses about 35MB of memory to get a reader for an object from S3 that represents the rows, decompress it using zstd, and serialize it directly into the gin output writer. Even works much faster than just doing the whole thing in memory, since it can start returning data immediately as it doesn't have to wait for the whole thing to be decoded into memory and then re-encode it.

The full PoC can be found here if anybody's interested:
https://github.com/TheEdgeOfRage/streaming-poc

huangapple
  • 本文由 发表于 2023年5月18日 01:39:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76274813.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定