英文:
How to write custom splitFunc for bufio.Scaner that scan json objects
问题
我有一个类似这样的代码:
scanner := bufio.NewScanner(reader)
scanner.Split(splitJSON)
for scanner.Scan() {
bb := scanner.Bytes()
}
我想一次只获取来自Scanner的有效的JSON对象。在某些情况下,Scanner中可能包含表示如下结构的字节:
{
"some_object": "name",
"some_fields": {}
}
{
"some_object":
}
我只需要第一部分:
{
"some_object": "name",
"some_fields": {}
}
对于其他部分,我应该等待JSON对象的结束。
我有一个像这样的函数,但它很糟糕并且不起作用。
func splitJSON(bb []byte, atEOF bool) (advance int, token []byte, err error) {
print(string(bb))
if len(bb) < 10 {
return 0, nil, nil
}
var nested, from, to int
var end bool
for i, b := range bb {
if string(b) == "{" {
if end {
to = i
break
}
if nested == 0 {
from = i
}
nested++
}
if string(b) == "}" {
nested--
if nested == 0 {
to = i
end = true
}
}
}
if atEOF {
return len(bb), bb, nil
}
return len(bb[from:to]), bb[from:to], nil
}
更新:
通过以下splitFunc进行了更改决定:
func splitJSON(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
reader := bytes.NewReader(data)
dec := json.NewDecoder(reader)
var raw json.RawMessage
if err := dec.Decode(&raw); err != nil {
return 0, nil, nil
}
return len(raw) + 1, raw, nil
}
英文:
I have a code like this
scanner := bufio.NewScanner(reader)
scanner.Split(splitJSON)
for scanner.Scan() {
bb := scanner.Bytes()
}
I would like to get from Scanner only valid JSON objects one at a time. In some case in Scanner may be bytes that represent struct like this
{
"some_object": "name",
"some_fileds": {}
}
{
"some_object":
}
I need only the first part of this
{
"some_object": "name",
"some_fileds": {}
}
For the other, I should wait for the end of JSON object.
I have a function like this, but it's horrible and doesn't work.
func splitJSON(
bb []byte, atEOF bool,
) (advance int, token []byte, err error) {
print(string(bb))
if len(bb) < 10 {
return 0, nil, nil
}
var nested, from, to int
var end bool
for i, b := range bb {
if string(b) == "{" {
if end {
to = i
break
}
if nested == 0 {
from = i
}
nested++
}
if string(b) == "}" {
nested--
if nested == 0 {
to = i
end = true
}
}
}
if atEOF {
return len(bb), bb, nil
}
return len(bb[from:to]), bb[from:to], nil
}
UPD
It was decided by this splitFunc
func splitJSON(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
reader := bytes.NewReader(data)
dec := json.NewDecoder(reader)
var raw json.RawMessage
if err := dec.Decode(&raw); err != nil {
return 0, nil, nil
}
return len(raw) + 1, raw, nil
}
答案1
得分: 1
使用json.Decoder
来实现。每次调用Decoder.Decode()
都会从输入中解码下一个JSON编码的值,对于你的情况是JSON对象。
如果你不想解码JSON对象,只需要JSON数据(字节切片),可以使用json.RawMessage
进行解组。
例如:
func main() {
reader := strings.NewReader(src)
dec := json.NewDecoder(reader)
for {
var raw json.RawMessage
if err := dec.Decode(&raw); err != nil {
if err == io.EOF {
break
}
fmt.Printf("Error:", err)
return
}
fmt.Println("Next:", string(raw))
}
}
const src = `{
"some_object": "name",
"some_fileds": {}
}
{
"some_object": "foo"
}`
这将输出(在Go Playground上尝试):
Next: {
"some_object": "name",
"some_fileds": {}
}
Next: {
"some_object": "foo"
}
英文:
Use json.Decoder
for this. Each Decoder.Decode()
call will decode the next JSON-encoded value from the input, JSON objects in your case.
If you don't want to decode the JSON objects just need the JSON data (byte slice), use a json.RawMessage
to unmarshal into.
For example:
func main() {
reader := strings.NewReader(src)
dec := json.NewDecoder(reader)
for {
var raw json.RawMessage
if err := dec.Decode(&raw); err != nil {
if err == io.EOF {
break
}
fmt.Printf("Error:", err)
return
}
fmt.Println("Next:", string(raw))
}
}
const src = `{
"some_object": "name",
"some_fileds": {}
}
{
"some_object": "foo"
}`
This will output (try it on the Go Playground):
Next: {
"some_object": "name",
"some_fileds": {}
}
Next: {
"some_object": "foo"
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论