Add .gz files to tar.gz file but decode gz before adding. Output files getting clipped (corrupted)

huangapple go评论72阅读模式
英文:

Add .gz files to tar.gz file but decode gz before adding. Output files getting clipped (corrupted)

问题

下面是我代码的一部分,用于收集一些gzip压缩的PDF文件。

我想将这些PDF文件添加到一个tar.gz文件中,但在添加之前需要将它们解压缩(gzip)。不想最终得到一个充满了pdf.gz文件的tar.gz文件。

需要在不将整个文件读入内存的情况下进行解压缩。tar.gz文件中的PDF文件被剪切和损坏。当我将tar.gz文件与原始PDF文件进行比较时,它们看起来是相等的,除了tar.gz文件被剪切了。每个文件的最后一部分丢失了。

// 创建具有压缩级别1的新gz写入器
gzw, _ := gzip.NewWriterLevel(w, 1)
defer gzw.Close()

// 创建新的tar写入器
tw := tar.NewWriter(gzw)
defer tw.Close()

file_path := "path-to-file.pdf.gz"
file_name := "filename-shown-in-tar.pdf"

// 打开要添加到tar中的文件
fp, err := os.Open(file_path)
if err != nil {
    log.Printf("Error: %v", err)
}
defer fp.Close()

file_name := file[1]+file_ext

info, err := fp.Stat()
if err != nil {
    log.Printf("Error: %v", err)
}
header, err := tar.FileInfoHeader(info, file_name)
if err != nil {
    log.Printf("Error: %v", err)
}
header.Name = file_name

tw.WriteHeader(header)

// 这部分将*.pdf.gz文件直接写入tar.gz文件
// 这部分可以正常工作,可以打开tar.gz文件,
// 然后打开各个pdf.gz文件
//io.Copy(tw, fp)

// 这部分在添加之前解码gz,但会剪切tar.gz文件中的pdf文件
gzr, err := gzip.NewReader(fp)
if err != nil {
    log.Printf("Error: %v", err)
}
defer gzr.Close()
io.Copy(tw, gzr)

更新

从评论中得到了一个建议,但现在无法打开tar中的PDF文件。tar.gz文件已创建并可以打开,但其中的PDF文件损坏。

尝试比较tar.gz的输出文件与原始PDF文件。看起来损坏的文件缺少最后一部分。

例如,原始文件有498行,而损坏的文件只有425行。但看起来这425行与原始文件相等。某种方式下,最后一部分被剪切了。

英文:

Below I have a snippet of my code which collects some gzip compressed PDF files.

I want to add the PDF's to a tar.gz file, but before adding them they need to be uncompressed (gzip). Don't want to end up with a tar.gz filled with pdf.gz files

Need to decompress it without reading the entire file into memory. The PDF files in the tar.gz are clipped and corrupted. When I compare the tar.gz files with the original PDF files the look equal except the tar.gz files are clipped. The last part of each file is missing

// Create new gz writer with compression level 1
gzw, _ := gzip.NewWriterLevel(w, 1)
defer gzw.Close()

// Create new tar writer
tw := tar.NewWriter(gzw)
defer tw.Close()

file_path := "path-to-file.pdf.gz"
file_name := "filename-shown-in-tar.pdf"

// Open file to add to tar
fp, err := os.Open(file_path)
if err != nil {
	log.Printf("Error: %v", err)
}
defer fp.Close()

file_name := file[1]+file_ext

info, err 	:= fp.Stat()
if err != nil {
	log.Printf("Error: %v", err)
}
header, err := tar.FileInfoHeader(info, file_name)
if err != nil {
	log.Printf("Error: %v", err)
}
header.Name = file_name

tw.WriteHeader(header)

// This part will write the *.pdf.gz files directly to the tar.gz file
// This part works and it's possible to both open the tar.gz file and
// afterwards open the individuel pdf.gz files
//io.Copy(tw, fp)

// This part decode the gz before adding, but it clips the pdf files in
// the tar.gz file
gzr, err := gzip.NewReader(fp)
if err != nil {
	log.Printf("Error: %v", err)
}
defer gzr.Close()
io.Copy(tw, gzr)

update

Got a suggestion from a comment, but now the PDF files inside the tar can't be opened. The tar.gz file is created and can be opened, but the PDF files inside are corrupted

Have tried to compare output files from the tar.gz with the original PDF. It looks like the corrupted file is missing the last bit of the file.

In one example the original file has 498 lines and the corrupted has only 425. But it looks like the 425 lines are equal to the original. Somehow the last bit is just clipped

答案1

得分: 3

问题似乎是你根据原始文件设置了文件信息头,而原始文件是压缩的。特别是大小是引起问题的原因 - 如果你尝试写入超过头部中Size值指示的大小,archive/tar.Writer.Write()将返回ErrWriteTooLong错误 - 参见https://github.com/golang/go/blob/d5efd0dd63a8beb5cc57ae7d25f9c60d5dea5c65/src/archive/tar/writer.go#L428-L429

以下类似的代码应该可以工作,通过解压缩并读取文件来确定准确的大小:

// 打开要添加到tar文件的文件
fp, err := os.Open(file_path)
if err != nil {
    log.Printf("错误:%v", err)
}
defer fp.Close()

gzr, _ := gzip.NewReader(fp)
if err != nil {
    panic(err)
}
defer gzr.Close()

data, err := io.ReadAll(gzr)
if err != nil {
    log.Printf("错误:%v", err)
}

// 创建文件的tar头部
header := &tar.Header{
    Name: file_name,
    Mode: 0600,
    Size: int64(len(data)),
}

// 将头部写入tar文件
if err = tw.WriteHeader(header); err != nil {
    log.Printf("错误:%v", err)
}

// 将文件内容写入tar文件
if _, err = tw.Write(data); err != nil {
    log.Printf("错误:%v", err)
}

请注意,这只是一个示例代码,你可能需要根据你的实际情况进行适当的修改。

英文:

The issue appears to be that you're setting the file info header based on the original file, which is compressed. In particular, it is the size that is causing problems - if you attempt to write in excess of the size indicated by the Size value in the header, archive/tar.Writer.Write() will return ErrWriteTooLong - see https://github.com/golang/go/blob/d5efd0dd63a8beb5cc57ae7d25f9c60d5dea5c65/src/archive/tar/writer.go#L428-L429

Something like the following should work, whereby the file is uncompressed and read so an accurate size can be established:

// Open file to add to tar
fp, err := os.Open(file_path)
if err != nil {
	log.Printf("Error: %v", err)
}
defer fp.Close()

gzr, _ := gzip.NewReader(fp)
if err != nil {
	panic(err)
}
defer gzr.Close()

data, err := io.ReadAll(gzr)
if err != nil {
	log.Printf("Error: %v", err)
}

// Create tar header for file
header := &tar.Header{
	Name: file_name,
	Mode: 0600,
	Size: int64(len(data)),
}

// Write header to the tar
if err = tw.WriteHeader(header); err != nil {
	log.Printf("Error: %v", err)
}

// Write the file content to the tar
if _, err = tw.Write(data); err != nil {
	log.Printf("Error: %v", err)
}

huangapple
  • 本文由 发表于 2022年10月21日 23:42:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/74156099.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定