使用archive/zip处理嵌套的zip文件。

huangapple go评论93阅读模式
英文:

Handling nested zip files with archive/zip

问题

我在处理Go语言中的嵌套zip文件时遇到了困难(即一个zip文件包含另一个zip文件)。我试图递归处理zip文件并列出它包含的所有文件。

archive/zip提供了两种处理zip文件的方法:

  • zip.NewReader
  • zip.OpenReader

OpenReader打开磁盘上的文件。NewReader接受一个io.ReaderAt和文件大小。当你使用这两个方法之一迭代遍历压缩文件时,你会得到一个zip.File,代表zip文件中的每个文件。要获取文件f的内容,你调用f.Open,它会返回一个zip.ReadCloser。要打开嵌套的zip文件,我需要使用NewReader,但是zip.Filezip.ReadCloser不满足io.ReaderAt接口的要求。

zip.File有一个私有字段zipr,它是一个io.ReaderAt,而zip.ReadCloser有一个私有字段f,它是一个os.File,应该满足NewReader的要求。

我的问题是:有没有办法在不先将内容写入磁盘文件或将整个内容读入内存的情况下打开嵌套的zip文件。

看起来zip.File中包含了所需的一切,但是它们没有被导出。我希望我漏掉了什么。

英文:

I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.

archive/zip gives you two methods for handling a zip file:

  • zip.NewReader
  • zip.OpenReader

OpenReader opens a file on disk. NewReader accepts an io.ReaderAt and a file size. As you iterate through the zipped files with either of these, you get out a zip.File for each file inside the zip. To get the file contents of file f, you call f.Open which gives you a zip.ReadCloser. To open a nested zip file, I'd need to use NewReader, but zip.File and zip.ReadCloser do not satisfy the io.ReaderAt interface.

zip.File has a private field zipr which is an io.ReaderAt and zip.ReadCloser has a private field f which is an os.File which should satisfy the requirements for NewReader.

My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.

It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.

答案1

得分: 2

如何使用io.Reader创建一个io.ReaderAt,并在向后移动时重新初始化:(此代码尚未经过充分测试,但希望你能理解)

package main

import (
	"io"
	"io/ioutil"
	"os"
	"strings"
)

type inefficientReaderAt struct {
	rdr    io.ReadCloser
	cur    int64
	initer func() (io.ReadCloser, error)
}

func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
	return &inefficientReaderAt{
		initer: initer,
	}
}

func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
	n, err = r.rdr.Read(p)
	r.cur += int64(n)
	return n, err
}

func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
	// 在倒回时重新初始化
	if off < r.cur || r.rdr == nil {
		r.cur = 0
		r.rdr, err = r.initer()
		if err != nil {
			return 0, err
		}
	}

	if off > r.cur {
		sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
		n = int(sz)
		if err != nil {
			return n, err
		}
	}

	return r.Read(p)
}

func main() {
	r := newInefficentReaderAt(func() (io.ReadCloser, error) {
		return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
	})

	io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
	io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}

如果你主要向前移动,这个方法可能可以正常工作。尤其是如果你使用缓冲读取器。

  • 我应该指出,这违反了io.ReaderAt的保证:https://godoc.org/io#ReaderFrom ,即它不允许并行调用ReadAt,并且不会在读取完整数据之前阻塞,所以这可能无法正常工作。
英文:

How about an io.ReaderAt from an io.Reader that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)

package main
import (
&quot;io&quot;
&quot;io/ioutil&quot;
&quot;os&quot;
&quot;strings&quot;
)
type inefficientReaderAt struct {
rdr    io.ReadCloser
cur    int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &amp;inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// reset on rewind
if off &lt; r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off &gt; r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader(&quot;ABCDEFG&quot;)), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}

If you mostly move forwards this probably works ok. Especially if you use a buffered reader.

  • I should note that this violates the io.ReaderAt guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt, and doesn't block on full reads, so this may not even work properly

答案2

得分: 0

我遇到了完全相同的需求,并提出了以下方法,不确定是否对你有帮助:

// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
    in := file.(io.Reader)

    if _, ok := in.(io.ReaderAt); ok != true {
        buffer, err := ioutil.ReadAll(in)

        if err != nil {
            return nil, err
        }

        in = bytes.NewReader(buffer)
        size = int64(len(buffer))
    }

    reader, err := zip.NewReader(in.(io.ReaderAt), size)

    if err != nil {
        return nil, err
    }

    return reader, nil
}

所以,如果file没有实现io.ReaderAt接口,它会将整个内容读入缓冲区。

这种方法可能不安全,无法处理超过内存大小的文件导致的OOM错误。

英文:

I ran into the exact same need and came up with the following approach, not sure if its any help to you:

// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}

So if file doesn't implement io.ReaderAt it reads the whole contents into a buffer.

It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.

huangapple
  • 本文由 发表于 2016年10月26日 00:46:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/40245442.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定