Multiple Arrow CSV Readers on same file returns null




  1. 复制文件描述符
  2. 复制文件描述符的偏移量,打开相同的文件并将其定位到该偏移量。
  3. 在调用flush或关闭第一个fd之前关闭第一个读取器。





I'm trying to read a the same file using multiple Goroutines, where each Goroutine is assigned a byte to start reading from and a number of lines to read lineLimit.

I was successful in doing so when the file fits in memory by setting the csv.ChunkSize option to the chunkSize variable. However, when the file is larger than memory, I need to reduce the csv.ChunkSize option. I was attempting something like this

I tried multiple versions of this previous code, including:

  1. Copying the file descriptor
  2. Copying the offset of the file descriptor, opening the same file
    and seeking to that offset.
  3. Closing the first reader before calling flush or closing the first fd.

The error seems to be the same no matter how I change the code. Note that any call to flush's reader raises an error. Includingreader.Next, and reader.Err().

Am I using the csv readers wrong? Is this a problem with reusing the same file?

EDIT: I don't know if this helps, but opening a new fd in flush without any Seek avoids the error (Somehow any Seek causes the original error to appear). However, the code is not correct without a Seek (i.e. removing Seek causes a part of the file to not be read at all by any Goroutine).


得分: 1



  1. fd, _ := os.Open(filename):永远不要忽略错误。至少记录下来。
  2. fd通常表示文件描述符。不要将其用于类型为*os.File的变量,特别是当*os.File有一个名为Fd的方法时。

The main issue is that, the csv reader uses a bufio.Reader underneath, which has a default buffer size 4096. That means reader.Next() will read more bytes than needed, and cache the extra bytes. If you read directly from the file after reader.Next(), you will miss the cached bytes.

The demo below shows this behavior:

It seems that the purpose of the second reader is to prevent it from reading into another block of csv data. If you know the offset of the next block of csv data in advance, you can wrap the file in an io.SectionReader to make it read only the current block of csv data. The current question does not provide enough information about this part, maybe we should leave it for another question.


  1. fd, _ := os.Open(filename): Never ignore errors. At least log them.
  2. fd means file descriptor most of the time. Don't use it for a variable of type *os.File, especially when *os.File has a method Fd.

