2016年10月26日 00:46:27go评论135阅读模式

英文:

Handling nested zip files with archive/zip

问题

我在处理Go语言中的嵌套zip文件时遇到了困难（即一个zip文件包含另一个zip文件）。我试图递归处理zip文件并列出它包含的所有文件。

archive/zip提供了两种处理zip文件的方法：

zip.NewReader
zip.OpenReader

OpenReader打开磁盘上的文件。NewReader接受一个io.ReaderAt和文件大小。当你使用这两个方法之一迭代遍历压缩文件时，你会得到一个zip.File，代表zip文件中的每个文件。要获取文件f的内容，你调用f.Open，它会返回一个zip.ReadCloser。要打开嵌套的zip文件，我需要使用NewReader，但是zip.File和zip.ReadCloser不满足io.ReaderAt接口的要求。

zip.File有一个私有字段zipr，它是一个io.ReaderAt，而zip.ReadCloser有一个私有字段f，它是一个os.File，应该满足NewReader的要求。

我的问题是：有没有办法在不先将内容写入磁盘文件或将整个内容读入内存的情况下打开嵌套的zip文件。

看起来zip.File中包含了所需的一切，但是它们没有被导出。我希望我漏掉了什么。

英文:

I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.

archive/zip gives you two methods for handling a zip file:

zip.NewReader
zip.OpenReader

OpenReader opens a file on disk. NewReader accepts an io.ReaderAt and a file size. As you iterate through the zipped files with either of these, you get out a zip.File for each file inside the zip. To get the file contents of file f, you call f.Open which gives you a zip.ReadCloser. To open a nested zip file, I'd need to use NewReader, but zip.File and zip.ReadCloser do not satisfy the io.ReaderAt interface.

zip.File has a private field zipr which is an io.ReaderAt and zip.ReadCloser has a private field f which is an os.File which should satisfy the requirements for NewReader.

My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.

It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.

答案1

得分: 2

如何使用io.Reader创建一个io.ReaderAt，并在向后移动时重新初始化：（此代码尚未经过充分测试，但希望你能理解）

package main
import (
	"io"
	"io/ioutil"
	"os"
	"strings"
)
type inefficientReaderAt struct {
	rdr    io.ReadCloser
	cur    int64
	initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
	return &inefficientReaderAt{
		initer: initer,
	}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
	n, err = r.rdr.Read(p)
	r.cur += int64(n)
	return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
	// 在倒回时重新初始化
	if off < r.cur || r.rdr == nil {
		r.cur = 0
		r.rdr, err = r.initer()
		if err != nil {
			return 0, err
		}
	}
	if off > r.cur {
		sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
		n = int(sz)
		if err != nil {
			return n, err
		}
	}
	return r.Read(p)
}
func main() {
	r := newInefficentReaderAt(func() (io.ReadCloser, error) {
		return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
	})
	io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
	io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}

如果你主要向前移动，这个方法可能可以正常工作。尤其是如果你使用缓冲读取器。

我应该指出，这违反了io.ReaderAt的保证：https://godoc.org/io#ReaderFrom ，即它不允许并行调用ReadAt，并且不会在读取完整数据之前阻塞，所以这可能无法正常工作。

英文:

How about an io.ReaderAt from an io.Reader that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)

package main
import (
&quot;io&quot;
&quot;io/ioutil&quot;
&quot;os&quot;
&quot;strings&quot;
)
type inefficientReaderAt struct {
rdr    io.ReadCloser
cur    int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &amp;inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// reset on rewind
if off &lt; r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off &gt; r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader(&quot;ABCDEFG&quot;)), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}

If you mostly move forwards this probably works ok. Especially if you use a buffered reader.

I should note that this violates the io.ReaderAt guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt, and doesn't block on full reads, so this may not even work properly

答案2

得分: 0

我遇到了完全相同的需求，并提出了以下方法，不确定是否对你有帮助：

// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
    in := file.(io.Reader)
    if _, ok := in.(io.ReaderAt); ok != true {
        buffer, err := ioutil.ReadAll(in)
        if err != nil {
            return nil, err
        }
        in = bytes.NewReader(buffer)
        size = int64(len(buffer))
    }
    reader, err := zip.NewReader(in.(io.ReaderAt), size)
    if err != nil {
        return nil, err
    }
    return reader, nil
}

所以，如果file没有实现io.ReaderAt接口，它会将整个内容读入缓冲区。

这种方法可能不安全，无法处理超过内存大小的文件导致的OOM错误。

英文:

I ran into the exact same need and came up with the following approach, not sure if its any help to you:

// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}

So if file doesn't implement io.ReaderAt it reads the whole contents into a buffer.

It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用archive/zip处理嵌套的zip文件。

问题

答案1

答案2

通过在GAE Datastore中指定值列表来查询实体。

为什么在需要字节类型的函数中可以使用 ' 而不是 "？

Go mod replace（补丁）只替换一个依赖包的方法是：

给 io.Reader 添加前缀

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。