英文:
Encode struct object with io.Reader in a streaming manner in go
问题
我有以下的结构体需要编码成JSON并输出到一个io.Writer
对象中:
type Data struct {
FieldA string `json:"field_a"`
FieldB int `json:"field_b"`
Rows io.Reader `json:"rows"`
}
Rows
是一个io.Reader
对象,它将以二进制格式返回一个可能非常大的JSON数组。我希望避免将整个结果从读取器加载到内存中,因为这会引入延迟和高内存开销。Rows保证是有效的JSON,所以不需要对其进行解码然后重新编码成JSON,可以直接传递。
我的问题是,标准库中的json包不支持定义一个流式友好的MarshalJSON
实现,它期望你将结果写入一个[]byte
缓冲区并返回。
使用json.NewEncoder(writer).Encode(Data)
基本上是我需要的,但是我无法为读取器定义自定义行为,它只会返回{}
。
有没有一种方法可以在不完全自定义json编码过程的情况下实现这一点?
英文:
I have the following struct that I need to encode into JSON and output to an io.Writer
object:
type Data struct {
FieldA string `json:"field_a"`
FieldB int `json:"field_b"`
Rows io.Reader `json:"rows"`
}
Rows
is an io.Reader
object that will return a potentially very large JSON array in binary format. I want to avoid loading the whole result from the reader into memory, as that introduces latency and high memory overhead. Rows is guaranteed to be valid JSON, so it doesn't have to be decoded and then reencoded into JSON, it can just be passed on as is.
My problem is that the json package from the standard library doesn't support defining a streaming friendly implementation of MarshalJSON
, it expects you to write your result into a []byte
buffer and return it.
Using json.NewEncoder(writer).Encode(Data)
is pretty much what I need, but I cannot define custom behavior for the reader and it just returns {}
for it.
Is there a way to achieve this without a completely custom implementation of the json encoding process?
答案1
得分: 1
使用标准的encoding/json包无法实现这一点。为Data
创建一个自定义的JSON编码器并不困难。
func encode(d *Data, w io.Writer) error {
field := func(name string, value any) {
b, _ := json.Marshal(name)
w.Write(b)
io.WriteString(w, ": ")
b, _ = json.Marshal(value)
w.Write(b)
io.WriteString(w, ", ")
}
io.WriteString(w, "{ ")
field("field_a", d.FieldA)
field("field_b", d.FieldB)
io.WriteString(w, "\"rows\": ")
_, err := io.Copy(w, d.Rows)
if err != nil {
return err
}
io.WriteString(w, "}\n")
return nil
}
以上是一个自定义的JSON编码器示例,用于将Data
类型编码为JSON格式并写入io.Writer
。
英文:
There is not a way to achieve this using the standard encoding/json package. A custom JSON encoder for Data
is not onerous.
func encode(d *Data, w io.Writer) error {
field := func(name string, value any) {
b, _ := json.Marshal(name)
w.Write(b)
io.WriteString(w, ": ")
b, _ = json.Marshal(value)
w.Write(b)
io.WriteString(w, ", ")
}
io.WriteString(w, "{ ")
field("field_a", d.FieldA)
field("field_b", d.FieldB)
io.WriteString(w, `"rows": `)
_, err := io.Copy(w, d.Rows)
if err != nil {
return err
}
io.WriteString(w, "}\n")
return nil
}
答案2
得分: 0
我采用了与其他答案类似的解决方案,但不需要手动序列化每个其他字段。
我定义了如下的数据结构:
type Data struct {
FieldA string `json:"field_a"`
FieldB int `json:"field_b"`
Rows io.Reader `json:"-"`
}
在Rows
字段的json标签中的-
表示go应该始终跳过将该字段编码为json输出。然后,编码函数如下所示:
func encode(d Data, writer io.Writer) error {
buf, err := json.Marshal(d)
if err != nil {
return err
}
_, err = writer.Write(buf[:(len(buf) - 1)])
if err != nil {
return err
}
_, err = writer.Write([]byte(",\"rows\":"))
if err != nil {
return err
}
_, err = io.Copy(writer, req.Rows)
if err != nil {
return err
}
_, err = writer.Write([]byte("}"))
if err != nil {
return err
}
return nil
}
- 我将对象原样序列化到缓冲区中(但不包括rows字段)
- 将除了最后一个闭合大括号之外的内容写入输出
- 将
,"rows":
写入输出以指示一个新字段 - 将rows读取器复制到输出写入器中
- 在输出中写入最后一个
}
这个方法效果很好,使用gin web服务器的端到端PoC大约使用35MB的内存,从S3获取一个表示行的对象的读取器,使用zstd解压缩,并直接将其序列化到gin输出写入器中。甚至比在内存中完成整个过程要快得多,因为它可以立即开始返回数据,而不必等待整个过程被解码到内存中,然后重新编码。
如果有人感兴趣,完整的PoC可以在这里找到:
https://github.com/TheEdgeOfRage/streaming-poc
英文:
I settled on a solution similar to the other answer, but which doesn't require manually serializing every other field.
I defined data like this:
type Data struct {
FieldA string `json:"field_a"`
FieldB int `json:"field_b"`
Rows io.Reader `json:"-"`
}
The -
in the json tag for the Rows
field indicates that go should always skip encoding that field into the json output. The encode function then looks like this:
func encode(d Data, writer io.Writer) error {
buf, err := json.Marshal(d)
if err != nil {
return err
}
_, err = writer.Write(buf[:(len(buf) - 1)])
if err != nil {
return err
}
_, err = writer.Write([]byte(",\"rows\":"))
if err != nil {
return err
}
_, err = io.Copy(writer, req.Rows)
if err != nil {
return err
}
_, err = writer.Write([]byte("}"))
if err != nil {
return err
}
return nil
}
- I serialize the object as-is into a buffer (but without the rows)
- Write to the output everything except the final closing brace
- Write to the output
,"rows":
to indicate a new field - Copy the rows reader to the output writer
- Write a final
}
to the output
It works pretty well and an end-to-end PoC with a gin web server uses about 35MB of memory to get a reader for an object from S3 that represents the rows, decompress it using zstd, and serialize it directly into the gin output writer. Even works much faster than just doing the whole thing in memory, since it can start returning data immediately as it doesn't have to wait for the whole thing to be decoded into memory and then re-encode it.
The full PoC can be found here if anybody's interested:
https://github.com/TheEdgeOfRage/streaming-poc
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论