How to read a text file line-by-line in Go when some lines are long enough to cause "bufio.Scanner: token too long" errors?

huangapple go评论83阅读模式
英文:

How to read a text file line-by-line in Go when some lines are long enough to cause "bufio.Scanner: token too long" errors?

问题

我有一个文本文件,每一行代表一个JSON对象。我正在使用Go语言的简单for循环处理这个文件,代码如下:

scanner := bufio.NewScanner(file)
for scanner.Scan() {
   jsonBytes := scanner.Bytes()
   var jsonObject interface{}
   err := json.Unmarshal(jsonBytes, &jsonObject)

   // 处理"jsonObject"...

}
if err := scanner.Err(); err != nil {
   log.Fatal(err)
}

当这段代码遇到一个特别大的JSON字符串(约67kb)时,我会收到错误消息"bufio.Scanner: token too long"。

有没有一种简单的方法可以增加NewScanner可读取的最大行大小?或者在需要读取太大而又不会超出安全大小的行时,是否有其他方法可以采取?

英文:

I have a text file where each line represents a JSON object. I am processing this file in Go with a simple for loop like this:

scanner := bufio.NewScanner(file)
for scanner.Scan() {
   jsonBytes = scanner.Bytes()
   var jsonObject interface{}
   err := json.Unmarshal(jsonBytes, &jsonObject)

   // do stuff with "jsonObject"...

}
if err := scanner.Err(); err != nil {
   log.Fatal(err)
}

When this code reaches a line with a particularly large JSON string (~67kb), I get the error message, "bufio.Scanner: token too long".

Is there an easy way to increase the max line size readable by NewScanner? Or is there another approach you can take altogether, when needing to read lines that are too large for NewScanner but are known to not be of unsafe size generally?

答案1

得分: 42

你也可以这样做:

scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 64*1024)
scanner.Buffer(buf, 1024*1024)
for scanner.Scan() {
    // 处理逻辑
}

scanner.Buffer() 的第二个参数设置了最大的标记大小。在上面的示例中,只要文件中的行没有超过1MB,你就可以扫描整个文件。

英文:

You can also do:

scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 64*1024)
scanner.Buffer(buf, 1024*1024)
for scanner.Scan() {
    // do your stuff
}

The second argument to scanner.Buffer() sets the maximum token size. In the above example you will be able to scan the file as long as none of the lines is larger than 1MB.

答案2

得分: 28

包文档中可以看出:

> 需要对错误处理或大型标记有更多控制权,或者必须在读取器上运行顺序扫描的程序,应该使用bufio.Reader。

看起来首选的解决方案是bufio.Reader.ReadLine。

英文:

From the package docs:

> Programs that need more control over error handling or large tokens,
> or must run sequential scans on a reader, should use bufio.Reader
> instead.

It looks like the preferred solution is bufio.Reader.ReadLine.

答案3

得分: 1

你肯定不想一行一行地阅读。为什么不这样做呢:

d := json.NewDecoder(file)
for {
   var ob 任意类型
   err := d.Decode(&ob)
   if err == io.EOF {
       break
   }
   if err != nil {
       log.Fatalf("解码错误:%v", err)
   }

   // 使用“jsonObject”进行操作...

}
英文:

You surely don't want to be reading line-by-line in the first place. Why don't you just do this:

d := json.NewDecoder(file)
for {
   var ob whateverType
   err := d.Decode(&ob)
   if err == io.EOF {
       break
   }
   if err != nil {
       log.Fatalf("Error decoding: %v", err)
   }

   // do stuff with "jsonObject"...

}

huangapple
  • 本文由 发表于 2014年1月15日 05:24:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/21124327.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定