2017年1月3日 05:20:42go评论78阅读模式

英文:

Read lines from a file with variable line endings in Go

问题

你可以使用bufio.Scanner来读取文件中的行，它可以处理以CR、LF或CRLF结尾的行。bufio.Scanner会自动处理\n前面可能有\r的情况，但是不能处理单独的\r。

以下是一个示例代码，演示如何使用bufio.Scanner读取文件中的行：

package main

import (
	"bufio"
	"fmt"
	"os"
)

func main() {
	file, err := os.Open("file.txt")
	if err != nil {
		fmt.Println("Failed to open file:", err)
		return
	}
	defer file.Close()

	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		fmt.Println(line)
	}

	if scanner.Err() != nil {
		fmt.Println("Error while reading file:", scanner.Err())
	}
}

你可以将上述代码保存为一个.go文件，并将file.txt替换为你要读取的文件路径。运行代码后，它将逐行打印文件的内容。

希望这可以帮助到你！如果你有其他问题，请随时问。

英文:

How can I read lines from a file where the line endings are carriage return (CR), newline (NL), or both?

The PDF specification allows lines to end with CR, LF, or CRLF.

bufio.Reader.ReadString() and bufio.Reader.ReadBytes() allow a single delimiter byte.
bufio.Scanner.Scan() handles \n optionally preceded by \r, but not a lone \r.
> The end-of-line marker is one optional carriage return followed by one mandatory newline.

Do I need to write my own function that uses bufio.Reader.ReadByte()?

答案1

得分: 5

你可以为bufio.Scanner编写自定义的bufio.SplitFunc。例如：

// 大部分是bufio.ScanLines的代码：
func ScanPDFLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }
    if i := bytes.IndexAny(data, "\r\n"); i >= 0 {
        if data[i] == '\n' {
            // 我们有一行以单个换行符结尾。
            return i + 1, data[0:i], nil
        }
        advance = i + 1
        if len(data) > i+1 && data[i+1] == '\n' {
            advance += 1
        }
        return advance, data[0:i], nil
    }
    // 如果我们在EOF处，我们有一行最后没有终止符。返回它。
    if atEOF {
        return len(data), data, nil
    }
    // 请求更多数据。
    return 0, nil, nil
}

然后像这样使用它：

scan := bufio.NewScanner(r)
scan.Split(ScanPDFLines)

英文:

You can write custom bufio.SplitFunc for bufio.Scanner. E.g:

// Mostly bufio.ScanLines code:
func ScanPDFLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
	if atEOF &amp;&amp; len(data) == 0 {
		return 0, nil, nil
	}
	if i := bytes.IndexAny(data, &quot;\r\n&quot;); i &gt;= 0 {
		if data[i] == &#39;\n&#39; {
			// We have a line terminated by single newline.
			return i + 1, data[0:i], nil
		}
		advance = i + 1
		if len(data) &gt; i+1 &amp;&amp; data[i+1] == &#39;\n&#39; {
			advance += 1
		}
		return advance, data[0:i], nil
	}
	// If we&#39;re at EOF, we have a final, non-terminated line. Return it.
	if atEOF {
		return len(data), data, nil
	}
	// Request more data.
	return 0, nil, nil
}

And use it like:

scan := bufio.NewScanner(r)
scan.Split(ScanPDFLines)

答案2

得分: 0

在阅读一个只有CR换行符的旧Mac生成的文件时，我遇到了一个回归问题，即如果CRLF跨越了缓冲区边界，接受的答案将把它们视为单独的行终止符。你需要在缓冲区以CR结尾时提前退出并请求更多数据。这似乎可以解决这个问题。

func scanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }
    if i := bytes.IndexAny(data, "\r\n"); i >= 0 {
        if data[i] == '\n' {
            // We have a line terminated by single newline.
            return i + 1, data[0:i], nil
        }
        // We have a line terminated by carriage return at the end of the buffer.
        if !atEOF && len(data) == i+1 {
            return 0, nil, nil
        }
        advance = i + 1
        if len(data) > i+1 && data[i+1] == '\n' {
            advance += 1
        }
        return advance, data[0:i], nil
    }
    // If we're at EOF, we have a final, non-terminated line. Return it.
    if atEOF {
        return len(data), data, nil
    }
    // Request more data.
    return 0, nil, nil
}

英文:

While reading an older Mac generated file with only CR line endings, I ran into regression for the edge case where if CRLF is split across the buffer boundary, the accepted answer will treat them as separate line terminators. You basically need to exit early and request more data if the buffer ends with CR. This seems to solve it.

func scanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
	if atEOF &amp;&amp; len(data) == 0 {
		return 0, nil, nil
	}
	if i := bytes.IndexAny(data, &quot;\r\n&quot;); i &gt;= 0 {
		if data[i] == &#39;\n&#39; {
			// We have a line terminated by single newline.
			return i + 1, data[0:i], nil
		}
		// We have a line terminated by carriage return at the end of the buffer.
		if !atEOF &amp;&amp; len(data) == i+1 {
			return 0, nil, nil
		}
		advance = i + 1
		if len(data) &gt; i+1 &amp;&amp; data[i+1] == &#39;\n&#39; {
			advance += 1
		}
		return advance, data[0:i], nil
	}
	// If we&#39;re at EOF, we have a final, non-terminated line. Return it.
	if atEOF {
		return len(data), data, nil
	}
	// Request more data.
	return 0, nil, nil
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Go语言中读取具有可变行尾的文件的行。

问题

答案1

答案2

通道缓冲区大小是什么？

内存分析运行时的 Golang 程序

去哪里进行错误跟踪/报告系统？

Go的网络爬虫在检查了大约2000个URL后停止运行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论