英文:
Handling nested zip files with archive/zip
问题
我在处理Go语言中的嵌套zip文件时遇到了困难(即一个zip文件包含另一个zip文件)。我试图递归处理zip文件并列出它包含的所有文件。
archive/zip提供了两种处理zip文件的方法:
- zip.NewReader
- zip.OpenReader
OpenReader
打开磁盘上的文件。NewReader
接受一个io.ReaderAt
和文件大小。当你使用这两个方法之一迭代遍历压缩文件时,你会得到一个zip.File
,代表zip文件中的每个文件。要获取文件f的内容,你调用f.Open
,它会返回一个zip.ReadCloser
。要打开嵌套的zip文件,我需要使用NewReader
,但是zip.File
和zip.ReadCloser
不满足io.ReaderAt
接口的要求。
zip.File
有一个私有字段zipr
,它是一个io.ReaderAt
,而zip.ReadCloser
有一个私有字段f
,它是一个os.File
,应该满足NewReader
的要求。
我的问题是:有没有办法在不先将内容写入磁盘文件或将整个内容读入内存的情况下打开嵌套的zip文件。
看起来zip.File
中包含了所需的一切,但是它们没有被导出。我希望我漏掉了什么。
英文:
I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.
archive/zip gives you two methods for handling a zip file:
- zip.NewReader
- zip.OpenReader
OpenReader
opens a file on disk. NewReader
accepts an io.ReaderAt
and a file size. As you iterate through the zipped files with either of these, you get out a zip.File
for each file inside the zip. To get the file contents of file f, you call f.Open
which gives you a zip.ReadCloser
. To open a nested zip file, I'd need to use NewReader
, but zip.File
and zip.ReadCloser
do not satisfy the io.ReaderAt
interface.
zip.File
has a private field zipr
which is an io.ReaderAt
and zip.ReadCloser
has a private field f
which is an os.File
which should satisfy the requirements for NewReader
.
My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.
It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.
答案1
得分: 2
如何使用io.Reader
创建一个io.ReaderAt
,并在向后移动时重新初始化:(此代码尚未经过充分测试,但希望你能理解)
package main
import (
"io"
"io/ioutil"
"os"
"strings"
)
type inefficientReaderAt struct {
rdr io.ReadCloser
cur int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// 在倒回时重新初始化
if off < r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off > r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}
如果你主要向前移动,这个方法可能可以正常工作。尤其是如果你使用缓冲读取器。
- 我应该指出,这违反了
io.ReaderAt
的保证:https://godoc.org/io#ReaderFrom ,即它不允许并行调用ReadAt
,并且不会在读取完整数据之前阻塞,所以这可能无法正常工作。
英文:
How about an io.ReaderAt
from an io.Reader
that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)
package main
import (
"io"
"io/ioutil"
"os"
"strings"
)
type inefficientReaderAt struct {
rdr io.ReadCloser
cur int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// reset on rewind
if off < r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off > r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}
If you mostly move forwards this probably works ok. Especially if you use a buffered reader.
- I should note that this violates the
io.ReaderAt
guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls toReadAt
, and doesn't block on full reads, so this may not even work properly
答案2
得分: 0
我遇到了完全相同的需求,并提出了以下方法,不确定是否对你有帮助:
// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}
所以,如果file
没有实现io.ReaderAt
接口,它会将整个内容读入缓冲区。
这种方法可能不安全,无法处理超过内存大小的文件导致的OOM错误。
英文:
I ran into the exact same need and came up with the following approach, not sure if its any help to you:
// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}
So if file
doesn't implement io.ReaderAt
it reads the whole contents into a buffer.
It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论