2015年5月5日 16:18:32go评论102阅读模式

英文:

Golang: Read non valid JSON from text file

问题

我有一个包含以下示例数据的txt文件：

host{
      Entry {
          id: &quot;foo&quot;
      }
       Entry {
          id: &quot;bar&quot;
      }
    }
    
port{
      Entry {
          id: &quot;lorem&quot;
      }
       Entry {
          id: &quot;ipsum&quot;
      }
    }

其中有300多个这样的Entry值。我想读取文件并提取属于port部分的id值。它不是有效的JSON，所以我不能使用JSON解码器，有没有其他方法可以提取这些值？

英文:

I have a txt file with the following sample data:

host{
      Entry {
          id: &quot;foo&quot;
      }
       Entry {
          id: &quot;bar&quot;
      }
    }
    
port{
      Entry {
          id: &quot;lorem&quot;
      }
       Entry {
          id: &quot;ipsum&quot;
      }
    }

It has +300 of those Entry values. I'd like to read the file and extract the id values belonging to the port section. It's not valid JSON so I can't use the json decoder, is there any other way of extracting the values?

答案1

得分: 1

如果结构始终相同，并且您只想要id值，您可以像这样操作（在Playground上）：

package main

import (
	"fmt"
	"strings"
)

func main() {
    // This will work only if ids don't have spaces
	fields := strings.Fields(input1)
	for i, field := range fields {
		if field == "id:" {
			fmt.Println("Got an id: ", fields[i+1][1:len(fields[i+1])-1])
		}
	}
	fmt.Println()

    // This will extract all strings enclosed in ""
    for i1, i2 := 0, 0;; {
		i := strings.Index(input2[i1:], "\"") // find the first " starting after the last match
   		if i > 0 { // if we found one carry on
   			i1 = i + 1 + i1 // set the start index to the absolute position in the string
   			i2 = strings.Index(input2[i1:], "\"") // find the second "
   			fmt.Println(input2[i1 : i1+i2]) // print the string between ""
   			i1 += i2 + 1 // set the new starting index to after the last match
   		} else { // otherwise we are done
   			break
   		}
   	}

    // Reading the text line by line and only processing port sections
	parts := []string{"port{", "  Entry {", "      id: \"foo bar\"", "  }", "   Entry {", "      id: \"more foo bar\"", "  }", "}"}
	isPortSection := false
	for _, part := range parts {
		if strings.HasPrefix(part, "port") {
			isPortSection = true
		}
		if strings.HasPrefix(part, "host") {
			isPortSection = false
		}
		if isPortSection && strings.HasPrefix(strings.TrimSpace(part), "id:") {
			line := strings.TrimSpace(part)
			fmt.Println(line[5:len(line)-1])
		}
	}
}

var input1 string = `port{
  Entry {
      id: "foo"
  }
   Entry {
      id: "bar"
  }
}`

var input2 string = `port{
  Entry {
      id: "foo bar"
  }
   Entry {
      id: "more foo bar"
  }
}`

输出：

Got an id:  foo
Got an id:  bar
foo bar
more foo bar
您可以将它们放入切片或映射中，或者根据需要进行任何其他操作。当然，您可以从文件中读取行而不是在循环中打印字符串文字。
[1]: https://play.golang.org/p/6NJQ-r9Fzm

英文:

If the structure is the same throughout and all you want is the id values you can do something like this (on the Playground):

package main
import (
&quot;fmt&quot;
&quot;strings&quot;
)
func main() {
// This will work only if ids don&#39;t have spaces
fields := strings.Fields(input1)
for i, field := range fields {
if field == &quot;id:&quot; {
fmt.Println(&quot;Got an id: &quot;, fields[i+1][1:len(fields[i+1])-1])
}
}
fmt.Println()
// This will extract all strings enclosed in &quot;&quot;
for i1, i2 := 0, 0;; {
i := strings.Index(input2[i1:], &quot;\&quot;&quot;) // find the first &quot; starting after the last match
if i &gt; 0 { // if we found one carry on
i1 = i + 1 + i1 // set the start index to the absolute position in the string
i2 = strings.Index(input2[i1:], &quot;\&quot;&quot;) // find the second &quot;
fmt.Println(input2[i1 : i1+i2]) // print the string between &quot;&quot;
i1 += i2 + 1 // set the new starting index to after the last match
} else { // otherwise we are done
break
}
}
// Reading the text line by line and only processing port sections
parts := []string{&quot;port{&quot;, &quot;  Entry {&quot;, &quot;      id: \&quot;foo bar\&quot;&quot;, &quot;  }&quot;, &quot;   Entry {&quot;, &quot;      id: \&quot;more foo bar\&quot;&quot;, &quot;  }&quot;, &quot;}&quot;}        
isPortSection := false
for _, part := range parts {
if string.HasPrefix(part, &quot;port&quot;){
isPortSection = true
}
if string.HasPrefix(part, &quot;host&quot;){
isPortSection = false
}
if isPortSection &amp;&amp; strings.HasPrefix(strings.TrimSpace(part),&quot;id:&quot;) {
line := strings.TrimSpace(part)
fmt.Println(line[5:len(line)-1])
}
}
}
var input1 string = `port{
Entry {
id: &quot;foo&quot;
}
Entry {
id: &quot;bar&quot;
}
}`
var input2 string = `port{
Entry {
id: &quot;foo bar&quot;
}
Entry {
id: &quot;more foo bar&quot;
}
}`

Prints:

Got an id:  foo
Got an id:  bar
foo bar
more foo bar

Instead of printing them in the loop you can stick them into a slice or map or do whatever you want/need to. And of course instead of using the string literal you read in the lines from your file.

答案2

得分: 1

我相信text/scanner在这里可能非常有用。它不是即插即用的，但可以让你对输入进行标记化，并且可以很好地解析你的字符串（包括空格、转义值等）。这是一个快速的概念验证，使用一个简单的状态机来捕获Entry部分中所有的id: {str}模式：

var s scanner.Scanner
s.Init(strings.NewReader(src))

// 保持解析过程的状态
const (
	StateNone = iota
	StateID
	StateIDColon
)
state := StateNone

lastToken := ""        // 上一个标记文本
sections := []string{} // 部分堆栈

tok := s.Scan()
for tok != scanner.EOF {
	txt := s.TokenText()
	switch txt {
	case "id":
		if state == StateNone {
			state = StateID
		} else {
			state = StateNone
		}
	case ":":
		if state == StateID {
			state = StateIDColon
		} else {
			state = StateNone
		}
	case "{":
		// 添加部分
		sections = append(sections, lastToken)
	case "}":
		// 移除部分
		if len(sections) > 0 {
			sections = sections[0 : len(sections)-1]	
		}
	default:
		if state == StateIDColon && sections[0] == "port" {
			// 这里是我们的字符串
			fmt.Println(txt)
		}
		state = StateNone
	}
	lastToken = txt
	tok = s.Scan()
}

你可以在这里运行它。如果你需要验证输入结构等，这肯定需要更多的工作，但对我来说，这是一个很好的起点。

英文:

I believe text/scanner might be very useful here. It's not plug&play, but will allow you to tokenise input and will parse your strings nicely (spaces, escaped values etc.). A quick proof of concept, scanner with a simple state machine to capture all id: {str} patterns which are in Entry section:

var s scanner.Scanner
s.Init(strings.NewReader(src))
// Keep state of parsing process
const (
StateNone = iota
StateID
StateIDColon
)
state := StateNone
lastToken := &quot;&quot;        // last token text
sections := []string{} // section stack
tok := s.Scan()
for tok != scanner.EOF {
txt := s.TokenText()
switch txt {
case &quot;id&quot;:
if state == StateNone {
state = StateID
} else {
state = StateNone
}
case &quot;:&quot;:
if state == StateID {
state = StateIDColon
} else {
state = StateNone
}
case &quot;{&quot;:
// Add section
sections = append(sections, lastToken)
case &quot;}&quot;:
// Remove section
if len(sections) &gt; 0 {
sections = sections[0 : len(sections)-1]	
}
default:
if state == StateIDColon &amp;&amp; sections[0] == &quot;port&quot; {
// Our string is here
fmt.Println(txt)
}
state = StateNone
}
lastToken = txt
tok = s.Scan()
}

You can play it here. This surely requires some more work if you need validate the input structure etc. but seems like a good starting point to me.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang：从文本文件中读取非有效的 JSON

问题

答案1

答案2

如何在Golang中获取多行输入 – 面试编码

如何在Go语言中避免使用长的switch-case语句

处理禁用的 Azure Key Vault 密钥保管库秘密使用 Go Azure SDK？

除了SIGKILL信号外，在Windows上无法终止进程的信号。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论