如何在Golang中使用顺序块覆盖文件

huangapple go评论76阅读模式
英文:

How to overwrite file with sequential chunks in Golang

问题

如何通过分块读取大文件,并按顺序处理每个块,然后将处理后的块覆盖到原来的位置或文件偏移量?

例如:我想读取一个1GB的文件,每个块大小为4096字节,对其进行一些操作,比如删除特殊字符(!@#$...),然后用处理后的内容替换原始内容,并继续处理下一个4096字节的块,直到文件末尾。

> 我不想将整个文件加载到内存中,块的顺序和偏移量非常重要,主要问题是如何按顺序读取和覆盖同一文件的块。

func main() {
	file, err := os.Open("file.xt")
	if err != nil {
		log.Println(err)
	}
	chunkSize := 4096
	current := make([]byte, chunkSize)

	for {
		// 以4096字节的块读取文件
		_, err := file.Read(current)
		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}

		// 处理块
		processedChunk := process(current)

		// 以O_APPEND模式重新打开同一文件,用于覆盖内容,对吗?
		file2, err := os.OpenFile("file.txt", os.O_WRONLY|os.O_APPEND, os.ModePerm)
		if err != nil {
			log.Println(err)
		}

		// 在这里如何继续进行,将processedChunk覆盖currentChunk?
	}
}

func process(data []byte) []byte {
	// 对块进行处理
	return data
}
英文:

How to read a large file by chunking it and process each chunk sequentially then overwrite the resulted chunk to where it exactly came from(the same position or offset of file)?

e.g: i want to read 1 GB file with 4096 bytes chunk do something with it like removing the special characters(!@#$...) then, replace result with the original content and, go to the next 4096 chunk to reach the end of file.

> I don't want to load all the file into memory, the order and offset of chunks is very matter and the main problem is with sequential read and overwrite chunk from the same file.

<br/>What i've just done: <br/>

func main(){
	file,err := os.Open(&quot;file.xt&quot;)
	if err != nil {
		log.Println(err)
	}
	chunkSize := 4096
	current := make([]byte, chunkSize)

	for {
		// read the file in 4096 bytes of chunk
		_, err := file.Read(current)
		if err != nil{
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}
		
		// 
		processedChunk := process(current)
		
		// we open the same file again with O_APPEND for overwriting the content, right?
		file2, err := os.OpenFile(&quot;file.txt&quot;, os.O_WRONLY|os.O_APPEND, os.ModePerm)
		if err != nil {
			log.Println(err)
		}
		
		// How to go ahead here with overwriting the processedChunk with currentChunk?
	}
}

func process(data []byte) []byte{
	// do something with the chunk
	return data
}

答案1

得分: 1

只需以读写模式打开文件,并使用File.WriteAt()将修改后的切片写回。

请注意,File.Read()可能不会填满整个切片,特别是如果你在文件末尾(没有更多数据),所以请存储并使用它返回的读取字节数:

n, err := file.Read(current)
// ...
processedChunk := process(current[:n])

别忘了关闭文件!

以下是完整的解决方案:

file, err := os.OpenFile("file.txt", os.O_RDWR, 0755)
if err != nil {
	log.Println(err)
}
defer file.Close()

current := make([]byte, 4096)

for pos := int64(0); ; {
	n, err := file.Read(current)
	if err != nil {
		if err == io.EOF {
			break
		}
		log.Fatal(err)
	}

	processedChunk := process(current[:n])
	if _, err := file.WriteAt(processedChunk, pos); err != nil {
		log.Fatal(err)
	}

	pos += int64(n)
}
英文:

Simply open the file in read-write mode, and use File.WriteAt() to write back the modified slice.

Note that File.Read() might not fill the full slice, especially if you're at the end of the file (and there's no more data), so store and use the number of read bytes it returns:

n, err := file.Read(current)
// ...
processedChunk := process(current[:n])

And don't forget to close the file!

Here's the complete solution:

file, err := os.OpenFile(&quot;file.txt&quot;, os.O_RDWR, 0755)
if err != nil {
	log.Println(err)
}
defer file.Close()

current := make([]byte, 4096)

for pos := int64(0); ; {
	n, err := file.Read(current)
	if err != nil {
		if err == io.EOF {
			break
		}
		log.Fatal(err)
	}

	processedChunk := process(current[:n])
	if _, err := file.WriteAt(processedChunk, pos); err != nil {
		log.Fatal(err)
	}

	pos += int64(n)
}

答案2

得分: 0

package main

import (
	"io"
	"log"
	"os"
)

func main() {
	file, err := os.Open("file.xt")
	if err != nil {
		log.Println(err)
	}
	chunkSize := 4
	current := make([]byte, chunkSize)
	file2, err := os.OpenFile("file.xt", os.O_WRONLY|os.O_CREATE, os.ModePerm)
	if err != nil {
		log.Println(err)
	}
	defer func() {
		file.Close()
		file2.Close()
	}()

	var seeker int64
	for {
		// 以 4096 字节的块读取文件
		readByteCount, err := file.ReadAt(current, seeker)
		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}

		//
		processedChunk := process(current)

		// 以 O_APPEND 模式重新打开同一个文件,用于覆盖内容,对吗?
		_, err = file2.WriteAt(processedChunk, seeker)
		if err != nil {
			log.Println(err)
		}
		seeker = seeker + int64(readByteCount)

		// 如何继续覆盖 processedChunk 和 currentChunk?
	}
}

func process(data []byte) []byte {
	// 对块进行处理
	var filtered []byte
	for _, char := range data {
		if string(char) != ";" {
			filtered = append(filtered, char)
		}
	}
	return filtered
}

`File.WriteAt()` 允许在当前的 seeker 位置写入数据根据读取的字节数可以移动 seeker 到正确的位置
英文:
package main
import (
&quot;io&quot;
&quot;log&quot;
&quot;os&quot;
)
func main() {
file, err := os.Open(&quot;file.xt&quot;)
if err != nil {
log.Println(err)
}
chunkSize := 4
current := make([]byte, chunkSize)
file2, err := os.OpenFile(&quot;file.xt&quot;, os.O_WRONLY|os.O_CREATE, os.ModePerm)
if err != nil {
log.Println(err)
}
defer func() {
file.Close()
file2.Close()
}()
var seeker int64
for {
// read the file in 4096 bytes of chunk
readByteCount, err := file.ReadAt(current, seeker)
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
//
processedChunk := process(current)
// we open the same file again with O_APPEND for overwriting the content, right?
_, err = file2.WriteAt(processedChunk, seeker)
if err != nil {
log.Println(err)
}
seeker = seeker + int64(readByteCount)
// How to go ahead here with overwriting the processedChunk with currentChunk?
}
}
func process(data []byte) []byte {
// do something with the chunk
var filtered []byte
for _, char := range data {
if string(char) != &quot;;&quot; {
filtered = append(filtered, char)
}
}
return filtered
}

File.WriteAt() allows to write where currently the seeker is. The seeker can be moved to correct location based on bytes read.

huangapple
  • 本文由 发表于 2022年3月31日 16:59:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/71689676.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定