在Go中过滤JSON流中的非JSON内容

huangapple go评论93阅读模式
英文:

Filtering non-json content in a json stream in Go

问题

我正在使用Go语言处理一个JSON结构的输入流。我从标准输入(stdin)接收输入流,无法更改通信协议。

我的问题是,每个JSON结构都以非JSON字符串行“end”(不带引号)结尾。

我正在使用Golang的encoder/json包来解码从标准输入接收到的JSON。问题是,当我第二次调用它并传入消息“invalid character 'e' looking for beginning of value”时,解码器会产生错误。

问题当然是,“end”字符串没有进行JSON编码。我想知道如何让Go的JSON解码器跳过这个字符串?

一些示例输入:

{"command": "ack", "id": "1231231"}
end
{"command": "fail", "id": "1231231"}
end
{
    "command": "log",
    // 要记录的消息
    "msg": "hello world!"
}
end

我尝试过的方法:

  • 我声明了:endStr := make([]byte, 10)
  • 我尝试使用fmt.Fscanf(os.Stdin, "%s", endStr)来读取字符串,但没有读取到任何数据。
  • 我尝试使用os.Stdin.Read(endStr),但也没有返回任何数据。
  • 在读取第一个JSON结构后,dec.Buffered()返回一个包含“end”字符串的io.Reader,但我不知道如何告诉解码器跳过它。

任何帮助将不胜感激。

英文:

I'm working with an input stream of json structures in Go. I receive the input stream from another application on my stdin and I can't alter the communications protocol.

The problem I have is that every json structure is terminated by a non-json string line: "end" (without the quotes).

I'm using the Golang encoder/json package to decode the json I'm receiving from stdin. The problem is that the decoder produces an error the second time I call it with the msg: "invalid character 'e' looking for beginning of value".

The issue, of course is, that the "end" string is not json encoded. I would like to know how I can have Go's json decoder skip over this string?

Some sample input:

{"command": "ack", "id": "1231231"}
end
{"command": "fail", "id": "1231231"}
end
{
    "command": "log",
    // the message to log
    "msg": "hello world!"
}
end

Things I've tried:

  • I declared: endStr := make([]byte, 10)
  • I've tried to use fmt.Fscanf(os.Stdin, "%s", endStr), to read past the string, but no data are read.
  • I've tried to use os.Stdin.Read(endStr), but it also returns no data.
  • After I read the first json structure, dec.Buffered() returns an io.Reader containing the "end" string, but I don't know how to tell the decoder to skip over this.

Any help would be appreciated.

答案1

得分: 3

所以我能想到的最好的解决方案是:

  1. 放弃json解码器,
  2. 从标准输入读取一个字节切片,
  3. 剪切切片以排除("\nend\n")字符字符串
  4. 将修剪后的切片传递给json解码器

我需要编写的代码:

// 创建一个缓冲区来保存流数据
data := make([]byte, 5000)

// 循环从标准输入读取数据
for {
    _, err = os.Stdin.Read(data)
    if err != nil {
        panic(err)
    }

    index := bytes.Index(data, []byte("\n"))
    data = data[:index]
	
    var myStruct MyStruct
    err = json.Unmarshal(data, &myStruct)
    if err != nil {
        panic(err)
    }

    //(对myStruct进行操作)
}
英文:

So the best solution I've been able to come up with is:

  1. Ditch the json Decoders,
  2. read a byte slice from stdin,
  3. trim the slice to exclude the ("\nend\n") character string
  4. pass the trimmed slice to a json Unmarshaller

The code I had to write:

// Create a buffer to hold the stream data
data := make([]byte, 5000)

// Read data from stdin in a loop
for {
    _, err = os.Stdin.Read(data)
    if err != nil {
        panic(err)
    }

    index := bytes.Index(data, []byte("\n"))
    data = data[:index]
	
    var myStruct MyStruct
    err = json.Unmarshal(data, &myStruct)
    if err != nil {
        panic(err)
    }

    //(Do something with myStruct)
}

答案2

得分: 2

package main

import "fmt"
import "encoding/json"
import "bytes"
import "io"
import "bufio"

var input = {"command": "ack", "id": "1231231"} end {"command": "fail", "id": "1231231"} end

func main() {
var input = bytes.NewBuffer([]byte(input))
var buf = bufio.NewReader(input)

var res map[string]interface{}
var err error
var dec *json.Decoder
for err == nil {
    if dec != nil {
        buf = bufio.NewReader(io.MultiReader(dec.Buffered(), buf))
    }
    dropEnd(buf)
    dec = json.NewDecoder(buf)
    if err = dec.Decode(&res); err == nil {
        fmt.Println("Read:", res)
    }
}
if err != io.EOF {
    fmt.Println("Unexpected error:", err)
}

}

func dropEnd(buf *bufio.Reader) {
var check = make([]byte, 4)
if check, _ = buf.Peek(4); bytes.Contains(check, []byte("end")) {
buf.Read(check)
}
}

英文:

This is messy, but it will do the trick:

package main

import "fmt"
import "encoding/json"
import "bytes"
import "io"
import "bufio"

var input = `{"command": "ack", "id": "1231231"}
end
{"command": "fail", "id": "1231231"}
end
`

func main() {
	// make an io.Reader out of our input constant
	var input = bytes.NewBuffer([]byte(input))
	// we're going to need a buffered reader so we can Peek
	var buf = bufio.NewReader(input)

	// This is the result of the decode.  Use whatever makes sense for you
	var res map[string]interface{}
	var err error
	var dec *json.Decoder
	// We're going to loop until we get an error (hopefully it will be io.EOF
	for err == nil {
		if dec != nil {
			// This is the tricky bit:  json.Decoder has its own buffer.
			// it will read more than the data it needs.  In my simple test, 
			// it buffers all of the data.  What we're doing here is constructing
			// a new bufio.Reader using the remaining bytes in the json decoder's buffer
			// and whatever hasn't been read from our original buffer.
			buf = bufio.NewReader(io.MultiReader(dec.Buffered(), buf))
		}
		// Now let's try to drop an 'end' statement from the buffer
		dropEnd(buf)
		// We need a new json.Decoder each time since the old one contains unusable
		// data in its internal buffer.
		dec = json.NewDecoder(buf)
		// do the decode
		if err = dec.Decode(&res); err == nil {
			fmt.Println("Read:", res)
		}
	}
	if err != io.EOF {
		fmt.Println("Unexpected error:", err)
	}
}

func dropEnd(buf *bufio.Reader) {
	var check = make([]byte, 4)
	// If the next 4 bytes (either "\nend" or "end\n") contain "end", drop read them off the buffer
	if check, _ = buf.Peek(4); bytes.Contains(check, []byte("end")) {
		buf.Read(check)
	}
}

You can play with this code here: http://play.golang.org/p/7NER_fTzXI

答案3

得分: 1

你可以将os.Stdin包装在bufio包的bufio.Reader中。然后使用buf.Peek(num)在解码之前先查看一下。

你也可以使用自定义的Scanner来分隔JSON块。

使用bufio而不是静态缓冲区的好处是它可以在流上工作。

英文:

You can wrap your os.Stdin in bufio.Reader from the bufio package. Then use buf.Peek(num) to look ahead before you Decode.

You can also use a custom Scanner to delimit the JSON chunks.

What's nice about using bufio vs a static buffer is it'll work on a stream.

答案4

得分: 0

如果您可以将JSON对象限制为一行,您只需要按行分割输入并忽略无法解组的部分。这是一小段代码。

package main

import (
	"encoding/json"
	"fmt"
	"strings"
)

var input = `{"command": "ack", "id": "1231231"}
end
{"command": "fail", "id": "1231231"}
end
`

type Obj struct {
	Command string
	Msg     string
	Id      string
}

func DoSomethingCool(o Obj) {
	// 在这里做一些酷炫的事情
	fmt.Println(o)
}

func main() {
	inputs := strings.Split(input, "\n")
	for _, v := range inputs {
		var obj Obj
		if err := json.Unmarshal([]byte(v), &obj); err == nil {
			DoSomethingCool(obj) // 得到一个有效的JSON
		}
	}
}
英文:

If you can restrict your JSON objects to a single line, you just have to break the input by line and ignore what doesn't Unmarshal. Here's a small piece of code.

package main

import (
	"encoding/json"
	"fmt"
	"strings"
)

var input = `{"command": "ack", "id": "1231231"}
end
{"command": "fail", "id": "1231231"}
end
`

type Obj struct {
	Command string
	Msg     string
	Id      string
}

func DoSomethingCool(o Obj) {
	// Do something cool here
	fmt.Println(o)
}

func main() {
	inputs := strings.Split(input, "\n")
	for _, v := range inputs {
		var obj Obj
		if err := json.Unmarshal([]byte(v), &obj); err == nil {
			DoSomethingCool(obj) // Got a valid JSON
		}
	}
}

huangapple
  • 本文由 发表于 2013年7月16日 20:20:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/17676367.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定