如何从标准输入读取zip文件

huangapple go评论72阅读模式
英文:

How to read zipfile from stdin

问题

我正在尝试在Python中从stdin读取zip文件,但我一直遇到问题。我想要的是能够运行cat test.xlsx | python3 test.py并创建一个有效的zipfile.ZipFile对象,而不必首先写入一个临时文件,如果可能的话。

我的初始方法是这样的,但ZipFile抱怨文件不可寻址,

import sys
import zipfile

zipfile.ZipFile(sys.stdin)

所以我改了一下,但现在它抱怨这不是一个有效的zip文件:

import io
import sys
import zipfile

zipfile.ZipFile(io.StringIO(sys.stdin.read()))

是否可以在不写入临时文件的情况下解决这个问题?

英文:

I'm trying to solve reading a zipfile from stdin in python, but I keep getting issues. What I want is to be able to run cat test.xlsx | python3 test.py and create a valid zipfile.ZipFile object without first writing a temporary file if possible.

My initial approach was this, but ZipFile complained the file is not seekable,


import sys
import zipfile

zipfile.ZipFile(sys.stdin)

so I changed it around, but now it complains that this is not a valid zip file:

import io
import sys
import zipfile

zipfile.ZipFile(io.StringIO(sys.stdin.read()))

Can this be solved without writing the zip to a temporary file?

答案1

得分: 3

Zip文件是二进制数据,而不是UTF-8编码的文本。如果您尝试使用sys.stdin.read()将文件读取为str,则会立即出现UnicodeDecodeError: 'utf-8' codec can't decode byte ...错误。

相反,您可以访问底层的二进制buffer对象以将stdin读取为原始的bytes。将其与BytesIO配对,以获得一个内存中可寻址的file-like对象:

zipfile.ZipFile(io.BytesIO(sys.stdin.buffer.read()))

或者,如果您提供一个可寻址的stdin(例如,通过重定向stdin而不是从管道流式传输),您可以直接操作sys.stdin.buffer

zipfile.ZipFile(sys.stdin.buffer)

配对上类似以下的东西:

python3 test.py <test.xlsx

如果您关心的话,您可以根据IO对象的seekable方法查询stdin是否可寻址,然后在两者之间进行选择:

if sys.stdin.buffer.seekable():
    zip_file = zipfile.ZipFile(sys.stdin.buffer)
else:
    buffer = io.BytesIO(sys.stdin.buffer.read())
    zip_file = zipfile.ZipFile(buffer)

print(zip_file.filelist)
英文:

Zip files are binary data, not UTF-8 encoded text. You won't be able to read the file into a str with sys.stdin.read() without immediately hitting a UnicodeDecodeError: 'utf-8' codec can't decode byte ... error.

Instead, you can access the underlying binary buffer object to read stdin as raw bytes. Pair that with BytesIO to get an in-memory seekable file-like object:

zipfile.ZipFile(io.BytesIO(sys.stdin.buffer.read()))

Alternatively, if you provide a seekable stdin (for example, by redirecting stdin instead of streaming from a pipe), you can operate on sys.stdin.buffer directly:

zipfile.ZipFile(sys.stdin.buffer)

paired with something like

python3 test.py <test.xlsx

If you care to, you can select between the two depending on whether stdin is seekable by querying the IO object's seekable method:

if sys.stdin.buffer.seekable():
    zip_file = zipfile.ZipFile(sys.stdin.buffer)
else:
    buffer = io.BytesIO(sys.stdin.buffer.read())
    zip_file = zipfile.ZipFile(buffer)

print(zip_file.filelist)

huangapple
  • 本文由 发表于 2023年7月27日 22:28:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780741.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定