英文:
Read lines from a file with variable line endings in Go
问题
你可以使用bufio.Scanner
来读取文件中的行,它可以处理以CR、LF或CRLF结尾的行。bufio.Scanner
会自动处理\n
前面可能有\r
的情况,但是不能处理单独的\r
。
以下是一个示例代码,演示如何使用bufio.Scanner
读取文件中的行:
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
file, err := os.Open("file.txt")
if err != nil {
fmt.Println("Failed to open file:", err)
return
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
fmt.Println(line)
}
if scanner.Err() != nil {
fmt.Println("Error while reading file:", scanner.Err())
}
}
你可以将上述代码保存为一个.go
文件,并将file.txt
替换为你要读取的文件路径。运行代码后,它将逐行打印文件的内容。
希望这可以帮助到你!如果你有其他问题,请随时问。
英文:
How can I read lines from a file where the line endings are carriage return (CR), newline (NL), or both?
The PDF specification allows lines to end with CR, LF, or CRLF.
-
bufio.Reader.ReadString()
andbufio.Reader.ReadBytes()
allow a single delimiter byte. -
bufio.Scanner.Scan()
handles\n
optionally preceded by\r
, but not a lone\r
.
> The end-of-line marker is one optional carriage return followed by one mandatory newline.
Do I need to write my own function that uses bufio.Reader.ReadByte()
?
答案1
得分: 5
你可以为bufio.Scanner
编写自定义的bufio.SplitFunc
。例如:
// 大部分是bufio.ScanLines的代码:
func ScanPDFLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "\r\n"); i >= 0 {
if data[i] == '\n' {
// 我们有一行以单个换行符结尾。
return i + 1, data[0:i], nil
}
advance = i + 1
if len(data) > i+1 && data[i+1] == '\n' {
advance += 1
}
return advance, data[0:i], nil
}
// 如果我们在EOF处,我们有一行最后没有终止符。返回它。
if atEOF {
return len(data), data, nil
}
// 请求更多数据。
return 0, nil, nil
}
然后像这样使用它:
scan := bufio.NewScanner(r)
scan.Split(ScanPDFLines)
英文:
You can write custom bufio.SplitFunc
for bufio.Scanner
. E.g:
// Mostly bufio.ScanLines code:
func ScanPDFLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "\r\n"); i >= 0 {
if data[i] == '\n' {
// We have a line terminated by single newline.
return i + 1, data[0:i], nil
}
advance = i + 1
if len(data) > i+1 && data[i+1] == '\n' {
advance += 1
}
return advance, data[0:i], nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), data, nil
}
// Request more data.
return 0, nil, nil
}
And use it like:
scan := bufio.NewScanner(r)
scan.Split(ScanPDFLines)
答案2
得分: 0
在阅读一个只有CR换行符的旧Mac生成的文件时,我遇到了一个回归问题,即如果CRLF跨越了缓冲区边界,接受的答案将把它们视为单独的行终止符。你需要在缓冲区以CR结尾时提前退出并请求更多数据。这似乎可以解决这个问题。
func scanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "\r\n"); i >= 0 {
if data[i] == '\n' {
// We have a line terminated by single newline.
return i + 1, data[0:i], nil
}
// We have a line terminated by carriage return at the end of the buffer.
if !atEOF && len(data) == i+1 {
return 0, nil, nil
}
advance = i + 1
if len(data) > i+1 && data[i+1] == '\n' {
advance += 1
}
return advance, data[0:i], nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), data, nil
}
// Request more data.
return 0, nil, nil
}
英文:
While reading an older Mac generated file with only CR line endings, I ran into regression for the edge case where if CRLF is split across the buffer boundary, the accepted answer will treat them as separate line terminators. You basically need to exit early and request more data if the buffer ends with CR. This seems to solve it.
func scanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexAny(data, "\r\n"); i >= 0 {
if data[i] == '\n' {
// We have a line terminated by single newline.
return i + 1, data[0:i], nil
}
// We have a line terminated by carriage return at the end of the buffer.
if !atEOF && len(data) == i+1 {
return 0, nil, nil
}
advance = i + 1
if len(data) > i+1 && data[i+1] == '\n' {
advance += 1
}
return advance, data[0:i], nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), data, nil
}
// Request more data.
return 0, nil, nil
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论