英文:
Golang: Reading a text file with multi-line rows
问题
我有一个包含多行记录的文本文件,每行之间由一个空行分隔。在Go语言中,读取这样的行应该采用什么方法最好呢?
我认为我可能需要使用Scanner和自定义的Split函数,但我想知道是否有更好/更简单的方法我没有注意到。
我尝试使用基于bufio.ScanLines的自定义Split函数:
func MyScanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "\n\n"); i >= 0 {
return i + 1, dropCR(data[0:i]), nil
}
if atEOF {
return len(data), dropCR(data), nil
}
return 0, nil, nil
}
但是在IndexAny调用时出现错误:
"syntax error: unexpected semicolon or newline, expecting )" - 已修复该错误
更新:根据建议修复了上面的语法错误,但我只得到了第一行。我按以下方式读取文件:
scanner.Split(MyScanLines)
scanner.Scan()
fmt.Println(scanner.Text())
有什么建议吗?
我正在尝试读取的测试文件示例:
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800
Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
.
.
.
英文:
I have a text file with multi-line rows, delimited by a blank line. What would be the best way to read that row for row in Go?
I think I may have to use a Scanner with my own Split function, but just wondering if there is a better/easier way that I am missing.
I have tried using my own Splitfunc based on bufio.ScanLines:
func MyScanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "\n\n"); i >= 0 {
return i + 1, dropCR(data[0:i]), nil
}
if atEOF {
return len(data), dropCR(data), nil
}
return 0, nil, nil
}
But I get an error on the IndexAny call:
"syntax error: unexpected semicolon or newline, expecting )" - Fixed that
Update: Fixed the syntax error above as suggested, but I only get the first line returned. I am reading the file as follows:
scanner.Split(MyScanLines)
scanner.Scan()
fmt.Println(scanner.Text())
Any suggestions?
Example of test file I am trying to read:
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800
Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
.
.
.
答案1
得分: 2
你的方法是可行的,但我建议你使用bufio.Scanner
,它默认按行扫描。
然后,你只需要逐行读取文件并填充你的结构体。当遇到空行时,将你的结构体放入一个切片中,并开始一个新的结构体。
以下是一个示例,摘自我的一个开源项目,演示了如何实现:
buffer := [][]string{}
block := []string{}
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
l := scanner.Text()
if len(strings.TrimSpace(l)) != 0 {
block = append(block, l)
continue
}
// 此时,脚本已经到达一个空行,
// 这意味着该块已准备好进行处理。
// 如果块不为空,则将其附加到缓冲区并清空它。
if len(block) != 0 {
buffer = append(buffer, block)
block = []string{}
}
}
if len(block) != 0 {
buffer = append(buffer, block)
}
英文:
You way is working, but I would advise you to use a bufio.Scanner
, which defaults to scanning line by line.
Then, you just start reading your file line by line and populating your struct. When encountering a blank line, put your struct into a slice and start with a new one.
Here is an example taken from one of my open source projects that demonstrate it:
buffer := [][]string{}
block := []string{}
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
l := scanner.Text()
if len(strings.TrimSpace(l)) != 0 {
block = append(block, l)
continue
}
// At this point, the script has reached an empty line,
// which means the block is ready to be processed.
// If the block is not empty, append it to the buffer and empty it.
if len(block) != 0 {
buffer = append(buffer, block)
block = []string{}
}
}
if len(block) != 0 {
buffer = append(buffer, block)
}
答案2
得分: 2
翻译结果如下:
首先,了解扫描并确保其正常工作:
package main
import (
"bufio"
"fmt"
"strings"
)
func main() {
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
l := scanner.Text()
fmt.Println(l)
}
}
var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800
Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`
接下来,将所需的数据收集到一个切片中。可能有一种方法可以检查文件结束符 EOF,但我找不到。这是我想出来的方法,它可以工作:
package main
import (
"bufio"
"fmt"
"strings"
)
func main() {
buffer := [][]string{}
block := []string{}
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
l := scanner.Text()
if len(l) != 0 {
block = append(block, l)
continue
}
if len(l) == 0 && len(block) != 0 {
buffer = append(buffer, block)
block = []string{}
continue
}
if len(l) == 0 {
block = []string{}
continue
}
}
if len(block) != 0 {
buffer = append(buffer, block)
block = []string{}
}
fmt.Println("PRINTING BUFFER - END OF PROGRAM - ALL DATA PROCESSED:", buffer)
}
var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800
Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`
英文:
Broken out. First understand scanning and make sure that is working:
package main
import (
"bufio"
"fmt"
"strings"
)
func main() {
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
l := scanner.Text()
fmt.Println(l)
}
}
var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800
Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`
Here is the code on the Go playground.
Next, gather the data you need into a slice. There is probably a way to check end of file, EOF, but I wasn't able to find it. This is what I came up with and this works:
package main
import (
"bufio"
"fmt"
"strings"
)
func main() {
buffer := [][]string{}
block := []string{}
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
l := scanner.Text()
if len(l) != 0 {
block = append(block, l)
continue
}
if len(l) == 0 && len(block) != 0 {
buffer = append(buffer, block)
block = []string{}
continue
}
if len(l) == 0 {
block = []string{}
continue
}
}
if len(block) != 0 {
buffer = append(buffer, block)
block = []string{}
}
fmt.Println("PRINTING BUFFER - END OF PROGRAM - ALL DATA PROCESSED:", buffer)
}
var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800
Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`
Here is the code on the playground.
答案3
得分: 2
这里是另一种实现相同功能的方法,使用了bufio.Reader
。逻辑与Elwiner的答案几乎相似。
下面的myReadLine
函数使用bufio.Reader
来读取文件中的下一个多行条目。
func myReadLine(file *os.File, reader *bufio.Reader) (lines []string, err error){
for {
line, _, err := reader.ReadLine()
if err != nil || len(line) == 0 {
break
}
lines = append(lines, string(line))
}
return lines, err
}
下面的代码示例演示了上述函数的使用方法:
reader := bufio.NewReader(file)
for {
lines, err := myReadLine(file, reader)
if err != nil || len(lines) == 0 {
break
}
fmt.Println(lines)
}
英文:
Here is an alternate approach to do the same, using bufio.Reader
. The logic is almost similar to Elwiner's answer.
myReadLine
function below uses bufio.Reader
to read the next multiline entry in the file.
func myReadLine(file *os.File, reader *bufio.Reader) (lines []string, err error){
for {
line, _, err := reader.ReadLine()
if err != nil || len(line) == 0 {
break
}
lines = append(lines, string(line))
}
return lines, err
}
The below code sample illustrates sample usage of the above function:
reader := bufio.NewReader(file)
for {
lines, err := myReadLine(file, reader)
if err != nil || len(lines) == 0 {
break
}
fmt.Println(lines)
}
答案4
得分: 1
bufio.Scan()在遇到文件结束时返回false。
我们将返回第二个'ok'参数,这样我们的调用者就可以知道我们是否已经到达输入的末尾。
最好将我们的记录累积在一个字符串切片中,并在最后进行连接。
显而易见的方法是将每一行依次附加到结果字符串中,但在行数的数量上是O(n^2)的。
将所有内容放在一起:
func ReadBlock(scanner *bufio.Scanner) (string, bool) {
var o []string
if scanner.Scan() == false {
return "", false
}
for len(scanner.Text()) > 0 {
o = append(o, scanner.Text())
if scanner.Scan() == false {
break
}
}
return strings.Join(o, " "), true
}
https://play.golang.org/p/C_fB8iaYJo
附注:根据您的输入,我怀疑您希望将结果返回为映射而不是连接的字符串。
英文:
bufio.Scan() returns false on EOF.
We will return a second 'ok' argument, so our caller can tell if we have
hit the end of our input.
Best to accumulate our record in a slice of strings, and concatenate at the end.
The obvious way of appending each line in turn to the result string will work, but is O(n^2) in the number of lines.
Putting it all together:
func ReadBlock(scanner *bufio.Scanner) (string, bool) {
var o []string
if scanner.Scan() == false {
return "", false
}
for len(scanner.Text()) > 0 {
o = append(o, scanner.Text())
if scanner.Scan() == false {
break
}
}
return strings.Join(o, " "), true
}
https://play.golang.org/p/C_fB8iaYJo
p.s. looking at your input, I suspect you would want to return the result as a map rather than a concatenated string.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论