英文:
Golang: Read non valid JSON from text file
问题
我有一个包含以下示例数据的txt文件:
host{
Entry {
id: "foo"
}
Entry {
id: "bar"
}
}
port{
Entry {
id: "lorem"
}
Entry {
id: "ipsum"
}
}
其中有300多个这样的Entry值。我想读取文件并提取属于port部分的id值。它不是有效的JSON,所以我不能使用JSON解码器,有没有其他方法可以提取这些值?
英文:
I have a txt file with the following sample data:
host{
Entry {
id: "foo"
}
Entry {
id: "bar"
}
}
port{
Entry {
id: "lorem"
}
Entry {
id: "ipsum"
}
}
It has +300 of those Entry values. I'd like to read the file and extract the id values belonging to the port section. It's not valid JSON so I can't use the json decoder, is there any other way of extracting the values?
答案1
得分: 1
如果结构始终相同,并且您只想要id值,您可以像这样操作(在Playground上):
package main
import (
"fmt"
"strings"
)
func main() {
// This will work only if ids don't have spaces
fields := strings.Fields(input1)
for i, field := range fields {
if field == "id:" {
fmt.Println("Got an id: ", fields[i+1][1:len(fields[i+1])-1])
}
}
fmt.Println()
// This will extract all strings enclosed in ""
for i1, i2 := 0, 0;; {
i := strings.Index(input2[i1:], "\"") // find the first " starting after the last match
if i > 0 { // if we found one carry on
i1 = i + 1 + i1 // set the start index to the absolute position in the string
i2 = strings.Index(input2[i1:], "\"") // find the second "
fmt.Println(input2[i1 : i1+i2]) // print the string between ""
i1 += i2 + 1 // set the new starting index to after the last match
} else { // otherwise we are done
break
}
}
// Reading the text line by line and only processing port sections
parts := []string{"port{", " Entry {", " id: \"foo bar\"", " }", " Entry {", " id: \"more foo bar\"", " }", "}"}
isPortSection := false
for _, part := range parts {
if strings.HasPrefix(part, "port") {
isPortSection = true
}
if strings.HasPrefix(part, "host") {
isPortSection = false
}
if isPortSection && strings.HasPrefix(strings.TrimSpace(part), "id:") {
line := strings.TrimSpace(part)
fmt.Println(line[5:len(line)-1])
}
}
}
var input1 string = `port{
Entry {
id: "foo"
}
Entry {
id: "bar"
}
}`
var input2 string = `port{
Entry {
id: "foo bar"
}
Entry {
id: "more foo bar"
}
}`
输出:
Got an id: foo
Got an id: bar
foo bar
more foo bar
您可以将它们放入切片或映射中,或者根据需要进行任何其他操作。当然,您可以从文件中读取行而不是在循环中打印字符串文字。
[1]: https://play.golang.org/p/6NJQ-r9Fzm
英文:
If the structure is the same throughout and all you want is the id values you can do something like this (on the Playground):
package main
import (
"fmt"
"strings"
)
func main() {
// This will work only if ids don't have spaces
fields := strings.Fields(input1)
for i, field := range fields {
if field == "id:" {
fmt.Println("Got an id: ", fields[i+1][1:len(fields[i+1])-1])
}
}
fmt.Println()
// This will extract all strings enclosed in ""
for i1, i2 := 0, 0;; {
i := strings.Index(input2[i1:], "\"") // find the first " starting after the last match
if i > 0 { // if we found one carry on
i1 = i + 1 + i1 // set the start index to the absolute position in the string
i2 = strings.Index(input2[i1:], "\"") // find the second "
fmt.Println(input2[i1 : i1+i2]) // print the string between ""
i1 += i2 + 1 // set the new starting index to after the last match
} else { // otherwise we are done
break
}
}
// Reading the text line by line and only processing port sections
parts := []string{"port{", " Entry {", " id: \"foo bar\"", " }", " Entry {", " id: \"more foo bar\"", " }", "}"}
isPortSection := false
for _, part := range parts {
if string.HasPrefix(part, "port"){
isPortSection = true
}
if string.HasPrefix(part, "host"){
isPortSection = false
}
if isPortSection && strings.HasPrefix(strings.TrimSpace(part),"id:") {
line := strings.TrimSpace(part)
fmt.Println(line[5:len(line)-1])
}
}
}
var input1 string = `port{
Entry {
id: "foo"
}
Entry {
id: "bar"
}
}`
var input2 string = `port{
Entry {
id: "foo bar"
}
Entry {
id: "more foo bar"
}
}`
Prints:
Got an id: foo
Got an id: bar
foo bar
more foo bar
Instead of printing them in the loop you can stick them into a slice or map or do whatever you want/need to. And of course instead of using the string literal you read in the lines from your file.
答案2
得分: 1
我相信text/scanner
在这里可能非常有用。它不是即插即用的,但可以让你对输入进行标记化,并且可以很好地解析你的字符串(包括空格、转义值等)。这是一个快速的概念验证,使用一个简单的状态机来捕获Entry
部分中所有的id: {str}
模式:
var s scanner.Scanner
s.Init(strings.NewReader(src))
// 保持解析过程的状态
const (
StateNone = iota
StateID
StateIDColon
)
state := StateNone
lastToken := "" // 上一个标记文本
sections := []string{} // 部分堆栈
tok := s.Scan()
for tok != scanner.EOF {
txt := s.TokenText()
switch txt {
case "id":
if state == StateNone {
state = StateID
} else {
state = StateNone
}
case ":":
if state == StateID {
state = StateIDColon
} else {
state = StateNone
}
case "{":
// 添加部分
sections = append(sections, lastToken)
case "}":
// 移除部分
if len(sections) > 0 {
sections = sections[0 : len(sections)-1]
}
default:
if state == StateIDColon && sections[0] == "port" {
// 这里是我们的字符串
fmt.Println(txt)
}
state = StateNone
}
lastToken = txt
tok = s.Scan()
}
你可以在这里运行它。如果你需要验证输入结构等,这肯定需要更多的工作,但对我来说,这是一个很好的起点。
英文:
I believe text/scanner
might be very useful here. It's not plug&play, but will allow you to tokenise input and will parse your strings nicely (spaces, escaped values etc.). A quick proof of concept, scanner with a simple state machine to capture all id: {str}
patterns which are in Entry
section:
var s scanner.Scanner
s.Init(strings.NewReader(src))
// Keep state of parsing process
const (
StateNone = iota
StateID
StateIDColon
)
state := StateNone
lastToken := "" // last token text
sections := []string{} // section stack
tok := s.Scan()
for tok != scanner.EOF {
txt := s.TokenText()
switch txt {
case "id":
if state == StateNone {
state = StateID
} else {
state = StateNone
}
case ":":
if state == StateID {
state = StateIDColon
} else {
state = StateNone
}
case "{":
// Add section
sections = append(sections, lastToken)
case "}":
// Remove section
if len(sections) > 0 {
sections = sections[0 : len(sections)-1]
}
default:
if state == StateIDColon && sections[0] == "port" {
// Our string is here
fmt.Println(txt)
}
state = StateNone
}
lastToken = txt
tok = s.Scan()
}
You can play it here. This surely requires some more work if you need validate the input structure etc. but seems like a good starting point to me.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论