英文:
PyPDF2:Creating EncodedStreamObject is not currently supported
问题
以下是代码部分的中文翻译:
# 导入所需的库
from PyPDF2 import PdfReader, PdfWriter
# 替换文本内容的列表
replacements = [("Failed", "Passed")]
# 打开原始PDF文件
pdf = PdfReader(open("2.pdf", "rb"))
# 创建一个新的PDF写入对象
writer = PdfWriter()
# 遍历PDF的每一页
for page in pdf.pages:
# 获取页面内容
contents = page.get_contents().get_data()
# 遍历替换文本
for (a, b) in replacements:
contents = contents.replace(str.encode(a), str.encode(b))
# 更新页面内容
page.get_contents().set_data(str(contents))
# 将页面添加到新PDF中
writer.add_page(page)
# 将修改后的PDF保存为新文件
with open("2_modified.pdf", "wb") as f:
writer.write(f)
请注意,代码中的中文内容已经被翻译。如有其他问题,请提出。
英文:
The following code tries to edit part of text in a PDF file:
from PyPDF2 import PdfReader, PdfWriter
replacements = [("Failed", "Passed")]
pdf = PdfReader(open("2.pdf", "rb"))
writer = PdfWriter()
for page in pdf.pages:
contents = page.get_contents().get_data()
#print(contents) old contents
for (a, b) in replacements:
contents = contents.replace(str.encode(a), str.encode(b))
#print(contents) new contents which has 'Passed' as new value
page.get_contents().set_data(str(contents)) #Issue occurs here
writer.add_page(page)
with open("2_modified.pdf", "wb") as f:
writer.write(f)
Keep getting into below issue:
> Traceback (most recent call last): <br>
File "/pdf_editor.py", line 14, in <module> <br>
page.get_contents().set_data(str(contents)) #Issue occurs here <br>
File "/venv/lib/python3.9/site-packages/PyPDF2/generic/_data_structures.py", line 839, in set_data <br>
raise PdfReadError("Creating EncodedStreamObject is not currently supported") <br>
PyPDF2.errors.PdfReadError: Creating EncodedStreamObject is not currently supported
I tried with solutions mentioned here which did not work, also found this github link which has a lable "bug" but with no further updates.
UPDATE: <br>
I had tried the library which was in comments earlier did not pursue for two reasons:
- Seems not used widely
- Kept getting one or other issue last one being 'apply_redact_annotations' error
So wanted to know any other work around or any other good libraries to achieve this
答案1
得分: 1
我正在回答问题,而不是标题。虽然PyPDF2(现在与PyPDF合并)可以解码流对象以实时获取其数据,但它不支持隐式编码。虽然可能可以显式创建编码流,但我发现直接处理完全解码的文档更容易。我喜欢使用qpdf --qdf in.pdf uncompressed.pdf
。
顺便说一下,“encoded”意味着“compressed”(“Deflate”很流行)。
英文:
I am answering the question in lieu of the title. While PyPDF2 (now merged with PyPDF) can decode encoded stream objects for their data on the fly, it does not support implicit encoding. While it is probably possible to create encoded streams explicitly, I find it easier just to work on fully decoded documents. I like using qpdf --qdf in.pdf uncompressed.pdf
.
By the way, "encoded" means "compressed" ("Deflate" is popular).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论