将 JSON 解码为在通过 net/http 流式传输时仍然进行的状态。

huangapple go评论84阅读模式
英文:

Decode JSON as it is still streaming in via net/http

问题

过去,我使用go语言来解码API端点返回的JSON数据,示例如下:

client := &http.Client{}

req, err := http.NewRequest("GET", "https://some/api/endpoint", nil)
res, err := client.Do(req)
defer res.Body.Close()

buf, _ := ioutil.ReadAll(res.Body)

// ... 进行一些错误检查等操作 ...

err = json.Unmarshal(buf, &response)

我即将开始处理一个端点,该端点可能会以以下格式向我发送几兆字节的JSON数据:

{
    "somefield": "value",
    "items": [
        { LARGE OBJECT },
        { LARGE OBJECT },
        { LARGE OBJECT },
        { LARGE OBJECT },
        ...
    ]
}

JSON数据中的某个时刻将包含一个大型、任意长度的对象数组。我想将每个对象单独提取出来,并将它们分别放入消息队列中。我不需要解码这些对象本身。

如果我使用我的常规方法,这将在解码之前将整个响应加载到内存中。

有没有一种好的方法,在响应仍在流式传输时将每个大型对象项拆分出来,并将其发送到队列中?我这样做是为了避免在内存中保存太多的数据。

英文:

In the past I've used go to decode JSON from an API endpoint in the manner shown below.

client := &http.Client{}

req, err := http.NewRequest("GET", "https://some/api/endpoint", nil)
res, err := client.Do(req)
defer res.Body.Close()

buf, _ := ioutil.ReadAll(res.Body)

// ... Do some error checking etc ...

err = json.Unmarshal(buf, &response)

I am shortly going to be working on an endpoint that could send me several megabytes of JSON data in the following format.

{
    "somefield": "value",
    "items": [
	    { LARGE OBJECT },
    	{ LARGE OBJECT },
	    { LARGE OBJECT },
	    { LARGE OBJECT },
	    ...
    ]
}

The JSON will at some point contain an array of large, arbitrary length, objects. I want to take each one of these objects and place them, separately, into a message queue. I do not need to decode the objects themselves.

If I used my normal method, this would load the entire response into memory before decoding it.

Is there a good way to split out each of the LARGE OBJECT items as the response is still streaming in and dispatch it off to the queue? I'm doing this to avoid holding as much data in memory.

答案1

得分: 30

使用json.Decoder可以解码JSON流。

通过Decoder.Decode(),我们可以读取(解组)单个值,而不会消耗和解组完整的流。这很酷,但是你的输入是一个“单个”JSON对象,而不是一系列的JSON对象,这意味着调用Decoder.Decode()将尝试解组具有所有项(大对象)的完整JSON对象。

我们想要的是部分地、即时地处理单个JSON对象。为此,我们可以使用Decoder.Token(),它仅解析(推进)JSON输入流中的下一个后续标记并返回它。这被称为“事件驱动”解析。

当然,我们必须“处理”(解释和执行)这些标记,并构建一个“状态机”,以跟踪我们正在处理的JSON结构的位置。

下面是解决你的问题的一个实现。

我们将使用以下JSON输入:

{
    "somefield": "value",
    "otherfield": "othervalue",
    "items": [
        { "id": "1", "data": "data1" },
        { "id": "2", "data": "data2" },
        { "id": "3", "data": "data3" },
        { "id": "4", "data": "data4" }
    ]
}

并读取items,由以下类型建模的“大对象”:

type LargeObject struct {
	Id   string `json:"id"`
	Data string `json:"data"`
}

我们还将解析和解释JSON对象中的其他字段,但我们只会记录/打印它们。

为了简洁和方便的错误处理,我们将使用以下辅助错误处理函数:

he := func(err error) {
	if err != nil {
		log.Fatal(err)
	}
}

现在让我们看一些实际操作。在下面的示例中,为了简洁起见并在Go Playground上进行演示,我们将从一个string值中读取。要从实际的HTTP响应主体中读取,我们只需要更改一行代码,即创建json.Decoder的方式:

dec := json.NewDecoder(res.Body)

因此,演示如下:

dec := json.NewDecoder(strings.NewReader(jsonStream))
// 我们期望一个对象
t, err := dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '{' {
	log.Fatal("Expected object")
}

// 读取属性
for dec.More() {
	t, err = dec.Token()
	he(err)
	prop := t.(string)
	if t != "items" {
		var v interface{}
		he(dec.Decode(&v))
		log.Printf("Property '%s' = %v", prop, v)
		continue
	}

	// 这是“items”。我们期望它是一个数组
	t, err := dec.Token()
	he(err)
	if delim, ok := t.(json.Delim); !ok || delim != '[' {
		log.Fatal("Expected array")
	}
	// 读取项(大对象)
	for dec.More() {
		// 读取下一个项(大对象)
		lo := LargeObject{}
		he(dec.Decode(&lo))
		fmt.Printf("Item: %+v\n", lo)
	}
	// 数组结束标记
	t, err = dec.Token()
	he(err)
	if delim, ok := t.(json.Delim); !ok || delim != ']' {
		log.Fatal("Expected array closing")
	}
}

// 对象结束标记
t, err = dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '}' {
	log.Fatal("Expected object closing")
}

这将产生以下输出:

2009/11/10 23:00:00 Property 'somefield' = value
2009/11/10 23:00:00 Property 'otherfield' = othervalue
Item: {Id:1 Data:data1}
Item: {Id:2 Data:data2}
Item: {Id:3 Data:data3}
Item: {Id:4 Data:data4}

Go Playground上尝试完整的工作示例。

英文:

Decoding a JSON stream is possible with the json.Decoder.

With Decoder.Decode(), we may read (unmarshal) a single value without consuming and unmarshaling the complete stream. This is cool, but your input is a "single" JSON object, not a series of JSON objects, which means a call to Decoder.Decode() would attempt to unmarshal the complete JSON object with all items (large objects).

What we want is partially, on-the-fly processing of a single JSON object. For this, we may use Decoder.Token() which parses (advances) only the next subsequent token in the JSON input stream and returns it. This is called event-driven parsing.

Of course we have to "process" (interpret and act upon) the tokens and build a "state machine" that keeps track of where we're in the JSON structure we're processing.

Here's an implementation that solves your problem.

We will use the following JSON input:

{
    "somefield": "value",
    "otherfield": "othervalue",
    "items": [
        { "id": "1", "data": "data1" },
        { "id": "2", "data": "data2" },
        { "id": "3", "data": "data3" },
        { "id": "4", "data": "data4" }
    ]
}

And read the items, the "large objects" modeled by this type:

type LargeObject struct {
	Id   string `json:"id"`
	Data string `json:"data"`
}

We will also parse and interpret other fields in the JSON object, but we will only log / print them.

For brevity and easy error handling, We'll use this helper error handler function:

he := func(err error) {
	if err != nil {
		log.Fatal(err)
	}
}

And now let's see some action. In the example below for brevity and to have a working demonstration on the Go Playground, we'll read from a string value. To read from an actual HTTP response body, we only have to change a single line, which is how we create the json.Decoder:

dec := json.NewDecoder(res.Body)

So the demonstration:

dec := json.NewDecoder(strings.NewReader(jsonStream))
// We expect an object
t, err := dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '{' {
	log.Fatal("Expected object")
}

// Read props
for dec.More() {
	t, err = dec.Token()
	he(err)
	prop := t.(string)
	if t != "items" {
		var v interface{}
		he(dec.Decode(&v))
		log.Printf("Property '%s' = %v", prop, v)
		continue
	}

	// It's the "items". We expect it to be an array
	t, err := dec.Token()
	he(err)
	if delim, ok := t.(json.Delim); !ok || delim != '[' {
		log.Fatal("Expected array")
	}
	// Read items (large objects)
	for dec.More() {
		// Read next item (large object)
		lo := LargeObject{}
		he(dec.Decode(&lo))
		fmt.Printf("Item: %+v\n", lo)
	}
	// Array closing delim
	t, err = dec.Token()
	he(err)
	if delim, ok := t.(json.Delim); !ok || delim != ']' {
		log.Fatal("Expected array closing")
	}
}

// Object closing delim
t, err = dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '}' {
	log.Fatal("Expected object closing")
}

This will produce the following output:

2009/11/10 23:00:00 Property 'somefield' = value
2009/11/10 23:00:00 Property 'otherfield' = othervalue
Item: {Id:1 Data:data1}
Item: {Id:2 Data:data2}
Item: {Id:3 Data:data3}
Item: {Id:4 Data:data4}

Try the full, working example on the Go Playground.

答案2

得分: 0

如果你想尽可能提高生产力,你可以从流中读取键值对,并使用mailru/easyjson库中的词法分析器自行进行标记化:

r := bufio.NewReader(stream)
for err == nil {
    pair, _ := r.ReadBytes(',')
    x := jlexer.Lexer{
        Data: pair,
    }
    fmt.Printf("%q = ", x.String())
    x.WantColon()
    fmt.Printf("%d\n", x.Int())
}

请注意,为了简单起见,这里省略了错误处理和一些额外的检查。这是完整的工作示例:https://play.golang.org/p/kk-7aEotqFd

英文:

If you want to be as productive as possible, you could read key-value pairs from the stream and tokenize it by your self using lexer from mailru/easyjson library:

r := bufio.NewReader(stream)
for err == nil {
	pair, _ := r.ReadBytes(',')
	x := jlexer.Lexer{
		Data: pair,
	}
	fmt.Printf("%q = ", x.String())
	x.WantColon()
	fmt.Printf("%d\n", x.Int())
}

Note that error handling and some additional checks are skipped for the sake of simplicity. Here the full working example: https://play.golang.org/p/kk-7aEotqFd

huangapple
  • 本文由 发表于 2017年6月1日 20:11:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/44307219.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定