PyPDF2: 创建编码流对象(EncodedStreamObject)目前不受支持。

huangapple go评论143阅读模式
英文:

PyPDF2:Creating EncodedStreamObject is not currently supported

问题

以下是代码部分的中文翻译:

# 导入所需的库
from PyPDF2 import PdfReader, PdfWriter

# 替换文本内容的列表
replacements = [("Failed", "Passed")]

# 打开原始PDF文件
pdf = PdfReader(open("2.pdf", "rb"))
# 创建一个新的PDF写入对象
writer = PdfWriter()

# 遍历PDF的每一页
for page in pdf.pages:
    # 获取页面内容
    contents = page.get_contents().get_data()
    # 遍历替换文本
    for (a, b) in replacements:
        contents = contents.replace(str.encode(a), str.encode(b))
    # 更新页面内容
    page.get_contents().set_data(str(contents))
    # 将页面添加到新PDF中
    writer.add_page(page)

# 将修改后的PDF保存为新文件
with open("2_modified.pdf", "wb") as f:
    writer.write(f)

请注意,代码中的中文内容已经被翻译。如有其他问题,请提出。

英文:

The following code tries to edit part of text in a PDF file:

from PyPDF2 import PdfReader, PdfWriter

replacements = [("Failed", "Passed")]

pdf = PdfReader(open("2.pdf", "rb"))
writer = PdfWriter()

for page in pdf.pages:
    contents = page.get_contents().get_data()
    #print(contents) old contents
    for (a, b) in replacements:
        contents = contents.replace(str.encode(a), str.encode(b))
    #print(contents) new contents which has 'Passed' as new value
    page.get_contents().set_data(str(contents)) #Issue occurs here
    writer.add_page(page)

with open("2_modified.pdf", "wb") as f:
writer.write(f)

Keep getting into below issue:

> Traceback (most recent call last): <br>
File "/pdf_editor.py", line 14, in &lt;module&gt; <br>
&nbsp;&nbsp;&nbsp;&nbsp;page.get_contents().set_data(str(contents)) #Issue occurs here <br>
File "/venv/lib/python3.9/site-packages/PyPDF2/generic/_data_structures.py", line 839, in set_data <br>
&nbsp;&nbsp;&nbsp;&nbsp;raise PdfReadError("Creating EncodedStreamObject is not currently supported") <br>
PyPDF2.errors.PdfReadError: Creating EncodedStreamObject is not currently supported

I tried with solutions mentioned here which did not work, also found this github link which has a lable "bug" but with no further updates.

UPDATE: <br>
I had tried the library which was in comments earlier did not pursue for two reasons:

  1. Seems not used widely
  2. Kept getting one or other issue last one being 'apply_redact_annotations' error

So wanted to know any other work around or any other good libraries to achieve this

答案1

得分: 1

我正在回答问题,而不是标题。虽然PyPDF2(现在与PyPDF合并)可以解码流对象以实时获取其数据,但它不支持隐式编码。虽然可能可以显式创建编码流,但我发现直接处理完全解码的文档更容易。我喜欢使用qpdf --qdf in.pdf uncompressed.pdf

顺便说一下,“encoded”意味着“compressed”(“Deflate”很流行)。

英文:

I am answering the question in lieu of the title. While PyPDF2 (now merged with PyPDF) can decode encoded stream objects for their data on the fly, it does not support implicit encoding. While it is probably possible to create encoded streams explicitly, I find it easier just to work on fully decoded documents. I like using qpdf --qdf in.pdf uncompressed.pdf.

By the way, "encoded" means "compressed" ("Deflate" is popular).

huangapple
  • 本文由 发表于 2023年3月12日 18:27:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712497.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定