英文:
What does "Scan advances the Scanner to the next token" mean in Go's bufio.Scanner?
问题
根据Scanner.scan documents
的说明,Scan()
函数将Scanner前进到下一个标记(token),但这是什么意思呢?我发现Scanner.Text
和Scanner.Bytes
可能会有所不同,这让我感到困惑。
这段代码并不总是引起错误,但随着文件变得越来越大,就会出现问题:
func TestScanner(t *testing.T) {
path := "/tmp/test.txt"
f, err := os.Open(path)
if err != nil {
panic(fmt.Sprint("failed to open ", path))
}
defer f.Close()
scanner := bufio.NewScanner(f)
bs := make([][]byte, 0)
for scanner.Scan() {
bs = append(bs, scanner.Bytes())
}
f, err = os.Open(path)
if err != nil {
panic(fmt.Sprint("failed to open ", path))
}
defer f.Close()
scanner = bufio.NewScanner(f)
ss := make([]string, 0)
for scanner.Scan() {
ss = append(ss, scanner.Text())
}
for i, b := range bs {
if string(b) != ss[i] {
t.Errorf("expect %s, got %s", ss[i], string(b))
}
}
}
英文:
According to Scanner.scan documents
, Scan()
advances the Scanner to the next token, but what does that mean? I find that Scanner.Text
and Scanner.Bytes
can be different, which is puzzling.
This code doesn't always cause an error, but as the file becomes larger it does:
func TestScanner(t *testing.T) {
path := "/tmp/test.txt"
f, err := os.Open(path)
if err != nil {
panic(fmt.Sprint("failed to open ", path))
}
defer f.Close()
scanner := bufio.NewScanner(f)
bs := make([][]byte, 0)
for scanner.Scan() {
bs = append(bs, scanner.Bytes())
}
f, err = os.Open(path)
if err != nil {
panic(fmt.Sprint("failed to open ", path))
}
defer f.Close()
scanner = bufio.NewScanner(f)
ss := make([]string, 0)
for scanner.Scan() {
ss = append(ss, scanner.Text())
}
for i, b := range bs {
if string(b) != ss[i] {
t.Errorf("expect %s, got %s", ss[i], string(b))
}
}
}
答案1
得分: 4
标记由扫描器的split函数定义。当split函数找到一个标记或出现错误时,Scan()函数返回。
String()和Bytes()方法都返回当前的标记。String()方法返回标记的副本。Bytes()方法不分配内存,并且返回一个切片,该切片可能使用一个在后续调用Scan()时被覆盖的后备数组。
为了避免这个问题,复制从Bytes()返回的切片:
for scanner.Scan() {
bs = append(bs, append([]byte(nil), scanner.Bytes()...))
}
英文:
The token is defined by the scanner's split function. Scan() returns when the split function finds a token or there's an error.
The String() and Bytes() methods both return the current token. The String() method returns a copy of the token. The Bytes() method does not allocate memory and returns a slice that may use a backing array that's overwritten on a subsequent call to Scan().
Copy the slice returned from Bytes() to avoid this issue:
for scanner.Scan() {
bs = append(bs, append([]byte(nil), scanner.Bytes()...))
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论