What does "Scan advances the Scanner to the next token" mean in Go's bufio.Scanner?

huangapple go评论125阅读模式
英文:

What does "Scan advances the Scanner to the next token" mean in Go's bufio.Scanner?

问题

根据Scanner.scan documents的说明,Scan()函数将Scanner前进到下一个标记(token),但这是什么意思呢?我发现Scanner.TextScanner.Bytes可能会有所不同,这让我感到困惑。

这段代码并不总是引起错误,但随着文件变得越来越大,就会出现问题:

  1. func TestScanner(t *testing.T) {
  2. path := "/tmp/test.txt"
  3. f, err := os.Open(path)
  4. if err != nil {
  5. panic(fmt.Sprint("failed to open ", path))
  6. }
  7. defer f.Close()
  8. scanner := bufio.NewScanner(f)
  9. bs := make([][]byte, 0)
  10. for scanner.Scan() {
  11. bs = append(bs, scanner.Bytes())
  12. }
  13. f, err = os.Open(path)
  14. if err != nil {
  15. panic(fmt.Sprint("failed to open ", path))
  16. }
  17. defer f.Close()
  18. scanner = bufio.NewScanner(f)
  19. ss := make([]string, 0)
  20. for scanner.Scan() {
  21. ss = append(ss, scanner.Text())
  22. }
  23. for i, b := range bs {
  24. if string(b) != ss[i] {
  25. t.Errorf("expect %s, got %s", ss[i], string(b))
  26. }
  27. }
  28. }
英文:

According to Scanner.scan documents, Scan() advances the Scanner to the next token, but what does that mean? I find that Scanner.Text and Scanner.Bytes can be different, which is puzzling.

This code doesn't always cause an error, but as the file becomes larger it does:

  1. func TestScanner(t *testing.T) {
  2. path := "/tmp/test.txt"
  3. f, err := os.Open(path)
  4. if err != nil {
  5. panic(fmt.Sprint("failed to open ", path))
  6. }
  7. defer f.Close()
  8. scanner := bufio.NewScanner(f)
  9. bs := make([][]byte, 0)
  10. for scanner.Scan() {
  11. bs = append(bs, scanner.Bytes())
  12. }
  13. f, err = os.Open(path)
  14. if err != nil {
  15. panic(fmt.Sprint("failed to open ", path))
  16. }
  17. defer f.Close()
  18. scanner = bufio.NewScanner(f)
  19. ss := make([]string, 0)
  20. for scanner.Scan() {
  21. ss = append(ss, scanner.Text())
  22. }
  23. for i, b := range bs {
  24. if string(b) != ss[i] {
  25. t.Errorf("expect %s, got %s", ss[i], string(b))
  26. }
  27. }
  28. }

答案1

得分: 4

标记由扫描器的split函数定义。当split函数找到一个标记或出现错误时,Scan()函数返回。

String()和Bytes()方法都返回当前的标记。String()方法返回标记的副本。Bytes()方法不分配内存,并且返回一个切片,该切片可能使用一个在后续调用Scan()时被覆盖的后备数组。

为了避免这个问题,复制从Bytes()返回的切片

  1. for scanner.Scan() {
  2. bs = append(bs, append([]byte(nil), scanner.Bytes()...))
  3. }
英文:

The token is defined by the scanner's split function. Scan() returns when the split function finds a token or there's an error.

The String() and Bytes() methods both return the current token. The String() method returns a copy of the token. The Bytes() method does not allocate memory and returns a slice that may use a backing array that's overwritten on a subsequent call to Scan().

Copy the slice returned from Bytes() to avoid this issue:

  1. for scanner.Scan() {
  2. bs = append(bs, append([]byte(nil), scanner.Bytes()...))
  3. }

huangapple
  • 本文由 发表于 2017年6月6日 12:19:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/44381350.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定