Golang:读取包含多行记录的文本文件

huangapple go评论89阅读模式
英文:

Golang: Reading a text file with multi-line rows

问题

我有一个包含多行记录的文本文件,每行之间由一个空行分隔。在Go语言中,读取这样的行应该采用什么方法最好呢?

我认为我可能需要使用Scanner和自定义的Split函数,但我想知道是否有更好/更简单的方法我没有注意到。

我尝试使用基于bufio.ScanLines的自定义Split函数:

func MyScanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }
    if i := bytes.IndexAny(data, "\n\n"); i >= 0 {
        return i + 1, dropCR(data[0:i]), nil
    }
    if atEOF {
        return len(data), dropCR(data), nil
    }
    return 0, nil, nil
}

但是在IndexAny调用时出现错误:
"syntax error: unexpected semicolon or newline, expecting )" - 已修复该错误

更新:根据建议修复了上面的语法错误,但我只得到了第一行。我按以下方式读取文件:

scanner.Split(MyScanLines)
scanner.Scan()
fmt.Println(scanner.Text())

有什么建议吗?

我正在尝试读取的测试文件示例:

Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444

.
.
.
英文:

I have a text file with multi-line rows, delimited by a blank line. What would be the best way to read that row for row in Go?

I think I may have to use a Scanner with my own Split function, but just wondering if there is a better/easier way that I am missing.

I have tried using my own Splitfunc based on bufio.ScanLines:

func MyScanLines(data []byte, atEOF bool) (advance int, token []byte,    err error) {
    if atEOF && len(data) == 0 {
            return 0, nil, nil
    }
    if i := bytes.IndexAny(data, "\n\n"); i >= 0 {
            return i + 1, dropCR(data[0:i]), nil
    }
    if atEOF {
            return len(data), dropCR(data), nil
    }
    return 0, nil, nil
}

But I get an error on the IndexAny call:
"syntax error: unexpected semicolon or newline, expecting )" - Fixed that

Update: Fixed the syntax error above as suggested, but I only get the first line returned. I am reading the file as follows:

scanner.Split(MyScanLines)
scanner.Scan()
fmt.Println(scanner.Text())

Any suggestions?

Example of test file I am trying to read:

Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444

 .
 .
 .

答案1

得分: 2

你的方法是可行的,但我建议你使用bufio.Scanner,它默认按行扫描。
然后,你只需要逐行读取文件并填充你的结构体。当遇到空行时,将你的结构体放入一个切片中,并开始一个新的结构体。

以下是一个示例,摘自我的一个开源项目,演示了如何实现:

buffer := [][]string{}
block := []string{}
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
    l := scanner.Text()

    if len(strings.TrimSpace(l)) != 0 {
        block = append(block, l)
        continue
    }

    // 此时,脚本已经到达一个空行,
    // 这意味着该块已准备好进行处理。
    // 如果块不为空,则将其附加到缓冲区并清空它。
    if len(block) != 0 {
        buffer = append(buffer, block)
        block = []string{}
    }
}

if len(block) != 0 {
    buffer = append(buffer, block)
}
英文:

You way is working, but I would advise you to use a bufio.Scanner, which defaults to scanning line by line.
Then, you just start reading your file line by line and populating your struct. When encountering a blank line, put your struct into a slice and start with a new one.

Here is an example taken from one of my open source projects that demonstrate it:

buffer := [][]string{}
block := []string{}
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
	l := scanner.Text()

	if len(strings.TrimSpace(l)) != 0 {
		block = append(block, l)
		continue
	}

	// At this point, the script has reached an empty line,
	// which means the block is ready to be processed.
	// If the block is not empty, append it to the buffer and empty it.
	if len(block) != 0 {
		buffer = append(buffer, block)
		block = []string{}
	}
}

if len(block) != 0 {
	buffer = append(buffer, block)
}

答案2

得分: 2

翻译结果如下:

首先,了解扫描并确保其正常工作:

package main

import (
	"bufio"
	"fmt"
	"strings"
)

func main() {
	scanner := bufio.NewScanner(strings.NewReader(data))
	for scanner.Scan() {
		l := scanner.Text()
		fmt.Println(l)
	}

}

var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`

这是在 Go Playground 上的代码

接下来,将所需的数据收集到一个切片中。可能有一种方法可以检查文件结束符 EOF,但我找不到。这是我想出来的方法,它可以工作:

package main

import (
	"bufio"
	"fmt"
	"strings"
)

func main() {
	buffer := [][]string{}
	block := []string{}
	scanner := bufio.NewScanner(strings.NewReader(data))
	for scanner.Scan() {
		l := scanner.Text()

		if len(l) != 0 {
			block = append(block, l)
			continue
		}

		if len(l) == 0 && len(block) != 0 {
			buffer = append(buffer, block)
			block = []string{}
			continue
		}

		if len(l) == 0 {
			block = []string{}
			continue
		}

	}

	if len(block) != 0 {
		buffer = append(buffer, block)
		block = []string{}
	}

	fmt.Println("PRINTING BUFFER - END OF PROGRAM - ALL DATA PROCESSED:", buffer)

}

var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`

这是在 Go Playground 上的代码

英文:

Broken out. First understand scanning and make sure that is working:

package main

import (
	"bufio"
	"fmt"
	"strings"
)

func main() {
	scanner := bufio.NewScanner(strings.NewReader(data))
	for scanner.Scan() {
		l := scanner.Text()
		fmt.Println(l)

	}

}

var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`

Here is the code on the Go playground.

Next, gather the data you need into a slice. There is probably a way to check end of file, EOF, but I wasn't able to find it. This is what I came up with and this works:

package main

import (
	"bufio"
	"fmt"
	"strings"
)

func main() {
	buffer := [][]string{}
	block := []string{}
	scanner := bufio.NewScanner(strings.NewReader(data))
	for scanner.Scan() {
		l := scanner.Text()

		if len(l) != 0 {
			block = append(block, l)
			continue
		}

		if len(l) == 0 && len(block) != 0 {
			buffer = append(buffer, block)
			block = []string{}
			continue
		}

		if len(l) == 0 {
			block = []string{}
			continue
		}

	}

	if len(block) != 0 {
		buffer = append(buffer, block)
		block = []string{}
	}

	fmt.Println("PRINTING BUFFER - END OF PROGRAM - ALL DATA PROCESSED:", buffer)

}

var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`

Here is the code on the playground.

答案3

得分: 2

这里是另一种实现相同功能的方法,使用了bufio.Reader。逻辑与Elwiner的答案几乎相似。

下面的myReadLine函数使用bufio.Reader来读取文件中的下一个多行条目。

func myReadLine(file *os.File, reader *bufio.Reader) (lines []string, err error){
  for {
    line, _, err := reader.ReadLine()
    if err != nil || len(line) == 0 {
      break
    }
    lines = append(lines, string(line))
  }
  return lines, err
}

下面的代码示例演示了上述函数的使用方法:

reader := bufio.NewReader(file)
for {
  lines, err := myReadLine(file, reader)
  if err != nil || len(lines) == 0 {
    break
  }
  fmt.Println(lines)
}
英文:

Here is an alternate approach to do the same, using bufio.Reader. The logic is almost similar to Elwiner's answer.

myReadLine function below uses bufio.Reader to read the next multiline entry in the file.

func myReadLine(file *os.File, reader *bufio.Reader) (lines []string, err error){
  for {
    line, _, err := reader.ReadLine()
    if err != nil || len(line) == 0 {
  	  break
    }
	lines = append(lines, string(line))
  }
  return lines, err
}

The below code sample illustrates sample usage of the above function:

reader := bufio.NewReader(file)
for {
	lines, err := myReadLine(file, reader)
	if err != nil || len(lines) == 0 { 
		break 
	}
	fmt.Println(lines)
}

答案4

得分: 1

bufio.Scan()在遇到文件结束时返回false。
我们将返回第二个'ok'参数,这样我们的调用者就可以知道我们是否已经到达输入的末尾。

最好将我们的记录累积在一个字符串切片中,并在最后进行连接。
显而易见的方法是将每一行依次附加到结果字符串中,但在行数的数量上是O(n^2)的。

将所有内容放在一起:

func ReadBlock(scanner *bufio.Scanner) (string, bool) {
    var o []string
    if scanner.Scan() == false {
        return "", false
    }

    for len(scanner.Text()) > 0 {
        o = append(o, scanner.Text())
        if scanner.Scan() == false {
            break
        }
    }
    return strings.Join(o, " "), true
}

https://play.golang.org/p/C_fB8iaYJo

附注:根据您的输入,我怀疑您希望将结果返回为映射而不是连接的字符串。

英文:

bufio.Scan() returns false on EOF.
We will return a second 'ok' argument, so our caller can tell if we have
hit the end of our input.

Best to accumulate our record in a slice of strings, and concatenate at the end.
The obvious way of appending each line in turn to the result string will work, but is O(n^2) in the number of lines.

Putting it all together:

func ReadBlock(scanner *bufio.Scanner) (string, bool) {
    var o []string
	if scanner.Scan() == false {
	    return "", false
	}

    for len(scanner.Text()) > 0 {
	    o = append(o, scanner.Text())
		if scanner.Scan() == false {
    		break
	    }
    }
    return strings.Join(o, " "), true
}

https://play.golang.org/p/C_fB8iaYJo

p.s. looking at your input, I suspect you would want to return the result as a map rather than a concatenated string.

huangapple
  • 本文由 发表于 2015年12月23日 20:00:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/34435246.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定