重新从指定位置开始读取 CSV 文件。

huangapple go评论74阅读模式

Restart reading csv file from a defined position




for idx := 0; idx < startAt; idx++ {
	if _, readErr := reader.Read(); readErr != nil {
		if readErr == io.EOF {
			//文件结束 -> OK
			isEOF = true
		} else {
			return nil, errors.New(DATA_READ_ERROR)



例如,我尝试返回文件指针的当前位置(使用file.Seek(0, io.SeekCurrent)),然后在新的迭代中尝试使用file.Seek(oldPosition, io.SeekStart)来移动指针,但结果并不如预期。





func computeData(nrows int, startAt int64) {
	if csvFile, openErr := os.Open(config.DataSrcFile); openErr == nil {
		reader := csv.NewReader(csvFile)
		file.Seek(startAt, io.SeekStart)
		for idx := 0; idx < *nrows && !isEOF; idx++ {
			if csvLine, readErr := reader.Read(); readErr == nil {
			} else {
				if readErr == io.EOF {
					//文件结束 -> OK
				} else {
		bytesRead, _ := file.Seek(0, io.SeekCurrent)
		return bytesRead
func main() {
	var startAt int64 = 0
	nrows := 1000
	for !isMyConditionMatched {
		bytesRead = computeData(nrows, startAt)
		startAt += bytesRead

I need to process a big file in Go, so I don't want to load all the rows of my csv file at once but processing them by groups.

To restart the computation of the rows from where I left, I actually use a for cycle to skip the rows already read:

for idx := 0; idx &lt; startAt; idx++ {
    //Read rows and do nothing with the returned value
	if _, readErr := reader.Read(); readErr != nil {
		if readErr == io.EOF {
			//File end -&gt; OK
			isEOF = true
		} else {
			//Read failed
			return nil, errors.New(DATA_READ_ERROR)

This is a pretty simple solution; however, it is obviously inefficient. After reading the first lines the time to read the following increases exponentially.

To reduce this time I tried different alternatives, but every one of them doesn't work properly and makes the reader fails (rows are not read from the right address).

For instance, I tried to return the current position of the file pointer (using file.Seek(0, io.SeekCurrent) and then, on the new iteration, I tried to move the pointer using file.Seek(oldPosition, io.SeekStart) but it didn't work as expected.

There is a way to avoid the loop above and improve the reading time when restarting from where I left?


The way I used file Seek is very simple.

//compute data

func computeData(nrows int, startAt int64) {
	//Open file
	if csvFile, openErr := os.Open(config.DataSrcFile); openErr == nil {
		//Create a reader
		reader := csv.NewReader(csvFile)
		//Position the file pointer to the start point
		file.Seek(startAt, io.SeekStart)
		//Read n rows
		for idx := 0; idx &lt; *nrows &amp;&amp; !isEOF; idx++ {
			if csvLine, readErr := reader.Read(); readErr == nil {
				//Do stuff...
			} else {
				//Error registered reading csv
				if readErr == io.EOF {
					//File end -&gt; OK
				} else {
					//Return error
		//Return bytes read (actually simplified, in real case error is not
		// ignored)
		bytesRead, _ := file.Seek(0, io.SeekCurrent)
		return bytesRead
func main() {
	var startAt int64 = 0
	nrows := 1000
	for !isMyConditionMatched {
		bytesRead = computeData(nrows, startAt)
		startAt += bytesRead


得分: 1

问题在于encoding/csv内部使用了一个缓冲读取器,所以当你执行file.Seek(0, io.SeekCurrent)时,你得到的是底层文件的位置,但是一些数据已经被读取了而你没有使用它。


  • 一种是使用更低级别的实现,允许精确控制位置。
  • 另一种是找出有多少缓冲数据。



		file.Seek(startAt, io.SeekStart)
		bReader := bufio.NewReader(file)

		reader := csv.NewReader(bReader)


		bufSize := bReader.Buffered()
		filePos, err := file.Seek(0, io.SeekCurrent)
		return filePos - int64(bufSize)




The problem here is that encoding/csv internally uses a buffered reader, so when you execute file.Seek(0, io.SeekCurrent) you get the position on the underlying file but some data was read and you did not use it.

There are two possible solutions:

  • one is to use lower level implementations that allow to control exactly where you are
  • the other is to find out how much buffered data there is.

I'll show you an implementation of the second option (note that this relies on some knowledge of the internal working of the encoding/csv package and may stop working if it is changed)

First you create a new buffered io reader before creating the csv:

        //Position the file pointer to the start point
		file.Seek(startAt, io.SeekStart)
		bReader := bufio.NewReader(file)

		//Create a reader
		reader := csv.NewReader(bReader)

This will allow you to access the buffer. You can use this reader as you already do, but in the end you calculate the final position on the file by doing:

		bufSize := bReader.Buffered()
		filePos, err := file.Seek(0, io.SeekCurrent)
		return filePos - int64(bufSize)

This takes the current position in the file and removes the buffer that was created.

Note that the value returned is the position in the file and not the amount of bytes read in this call to the function.

  • 本文由 发表于 2021年9月16日 16:15:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/69204739.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
