What does "Scan advances the Scanner to the next token" mean in Go's bufio.Scanner?

huangapple go评论92阅读模式
英文:

What does "Scan advances the Scanner to the next token" mean in Go's bufio.Scanner?

问题

根据Scanner.scan documents的说明,Scan()函数将Scanner前进到下一个标记(token),但这是什么意思呢?我发现Scanner.TextScanner.Bytes可能会有所不同,这让我感到困惑。

这段代码并不总是引起错误,但随着文件变得越来越大,就会出现问题:

func TestScanner(t *testing.T) {
    path := "/tmp/test.txt"
    f, err := os.Open(path)
    if err != nil {
        panic(fmt.Sprint("failed to open ", path))
    }
    defer f.Close()
    scanner := bufio.NewScanner(f)

    bs := make([][]byte, 0)
    for scanner.Scan() {
        bs = append(bs, scanner.Bytes())
    }

    f, err = os.Open(path)
    if err != nil {
        panic(fmt.Sprint("failed to open ", path))
    }
    defer f.Close()
    scanner = bufio.NewScanner(f)
    ss := make([]string, 0)
    for scanner.Scan() {
        ss = append(ss, scanner.Text())
    }

    for i, b := range bs {
        if string(b) != ss[i] {
            t.Errorf("expect %s, got %s", ss[i], string(b))
        }
    }
}
英文:

According to Scanner.scan documents, Scan() advances the Scanner to the next token, but what does that mean? I find that Scanner.Text and Scanner.Bytes can be different, which is puzzling.

This code doesn't always cause an error, but as the file becomes larger it does:

func TestScanner(t *testing.T) {
	path := "/tmp/test.txt"
	f, err := os.Open(path)
	if err != nil {
		panic(fmt.Sprint("failed to open ", path))
	}
	defer f.Close()
	scanner := bufio.NewScanner(f)

	bs := make([][]byte, 0)
	for scanner.Scan() {
		bs = append(bs, scanner.Bytes())
	}

	f, err = os.Open(path)
	if err != nil {
		panic(fmt.Sprint("failed to open ", path))
	}
	defer f.Close()
	scanner = bufio.NewScanner(f)
	ss := make([]string, 0)
	for scanner.Scan() {
		ss = append(ss, scanner.Text())
	}

	for i, b := range bs {
		if string(b) != ss[i] {
			t.Errorf("expect %s, got %s", ss[i], string(b))
		}
	}
}

答案1

得分: 4

标记由扫描器的split函数定义。当split函数找到一个标记或出现错误时,Scan()函数返回。

String()和Bytes()方法都返回当前的标记。String()方法返回标记的副本。Bytes()方法不分配内存,并且返回一个切片,该切片可能使用一个在后续调用Scan()时被覆盖的后备数组。

为了避免这个问题,复制从Bytes()返回的切片

for scanner.Scan() {
    bs = append(bs, append([]byte(nil), scanner.Bytes()...))
}
英文:

The token is defined by the scanner's split function. Scan() returns when the split function finds a token or there's an error.

The String() and Bytes() methods both return the current token. The String() method returns a copy of the token. The Bytes() method does not allocate memory and returns a slice that may use a backing array that's overwritten on a subsequent call to Scan().

Copy the slice returned from Bytes() to avoid this issue:

for scanner.Scan() {
    bs = append(bs, append([]byte(nil), scanner.Bytes()...))
}

huangapple
  • 本文由 发表于 2017年6月6日 12:19:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/44381350.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定