英文:
How to read zipfile from stdin
问题
我正在尝试在Python中从stdin
读取zip文件,但我一直遇到问题。我想要的是能够运行cat test.xlsx | python3 test.py
并创建一个有效的zipfile.ZipFile
对象,而不必首先写入一个临时文件,如果可能的话。
我的初始方法是这样的,但ZipFile
抱怨文件不可寻址,
import sys
import zipfile
zipfile.ZipFile(sys.stdin)
所以我改了一下,但现在它抱怨这不是一个有效的zip文件:
import io
import sys
import zipfile
zipfile.ZipFile(io.StringIO(sys.stdin.read()))
是否可以在不写入临时文件的情况下解决这个问题?
英文:
I'm trying to solve reading a zipfile from stdin
in python, but I keep getting issues. What I want is to be able to run cat test.xlsx | python3 test.py
and create a valid zipfile.ZipFile
object without first writing a temporary file if possible.
My initial approach was this, but ZipFile
complained the file is not seekable,
import sys
import zipfile
zipfile.ZipFile(sys.stdin)
so I changed it around, but now it complains that this is not a valid zip file:
import io
import sys
import zipfile
zipfile.ZipFile(io.StringIO(sys.stdin.read()))
Can this be solved without writing the zip to a temporary file?
答案1
得分: 3
Zip文件是二进制数据,而不是UTF-8编码的文本。如果您尝试使用sys.stdin.read()
将文件读取为str
,则会立即出现UnicodeDecodeError: 'utf-8' codec can't decode byte ...
错误。
相反,您可以访问底层的二进制buffer
对象以将stdin读取为原始的bytes
。将其与BytesIO
配对,以获得一个内存中可寻址的file-like对象:
zipfile.ZipFile(io.BytesIO(sys.stdin.buffer.read()))
或者,如果您提供一个可寻址的stdin(例如,通过重定向stdin而不是从管道流式传输),您可以直接操作sys.stdin.buffer
:
zipfile.ZipFile(sys.stdin.buffer)
配对上类似以下的东西:
python3 test.py <test.xlsx
如果您关心的话,您可以根据IO对象的seekable
方法查询stdin是否可寻址,然后在两者之间进行选择:
if sys.stdin.buffer.seekable():
zip_file = zipfile.ZipFile(sys.stdin.buffer)
else:
buffer = io.BytesIO(sys.stdin.buffer.read())
zip_file = zipfile.ZipFile(buffer)
print(zip_file.filelist)
英文:
Zip files are binary data, not UTF-8 encoded text. You won't be able to read the file into a str
with sys.stdin.read()
without immediately hitting a UnicodeDecodeError: 'utf-8' codec can't decode byte ...
error.
Instead, you can access the underlying binary buffer
object to read stdin as raw bytes
. Pair that with BytesIO
to get an in-memory seekable file-like object:
zipfile.ZipFile(io.BytesIO(sys.stdin.buffer.read()))
Alternatively, if you provide a seekable stdin (for example, by redirecting stdin instead of streaming from a pipe), you can operate on sys.stdin.buffer
directly:
zipfile.ZipFile(sys.stdin.buffer)
paired with something like
python3 test.py <test.xlsx
If you care to, you can select between the two depending on whether stdin is seekable by querying the IO object's seekable
method:
if sys.stdin.buffer.seekable():
zip_file = zipfile.ZipFile(sys.stdin.buffer)
else:
buffer = io.BytesIO(sys.stdin.buffer.read())
zip_file = zipfile.ZipFile(buffer)
print(zip_file.filelist)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论