如何编辑/修改/替换现有 PDF 文件中的文本?

huangapple go评论67阅读模式
英文:

How can I edit/modify/replace text in an existing PDF file?

问题

我正在进行我的毕业项目,所以我正在开发一个网站,用户可以在上面阅读PDF。我正在添加一些功能,比如将货币转换为他们国家的货币。我正在使用Flask和PyMuPDF来完成我的项目,但我不知道如何修改PDF中的文本。有谁可以帮助我解决这个问题?

我听说可以使用PyMuPDF或PyPDF来实现,但我没有找到替换文本的解决方案。

英文:

I am working on my final year project, so I working on a website where a user can come and read PDF. I am adding some features such as converting currency to their country currency. I am using flask and pymuPDF for my project and I don't know how I can modify the text at a pdf
anyone can help me with this problem?

I heard here that using pymuPDF or pypdf can work, but I didn't find any solution for replacing text.

答案1

得分: 3

使用PyMuPDF的编辑功能可能是合适的做法。
方法如下:

  1. 标识要替换的文本位置
  2. 擦除文本并使用编辑替换它

必须小心获取原始字体以及新文本是否比原始文本长/短。

import fitz  # 导入PyMuPDF

doc = fitz.open("myfile.pdf")
page = doc[number]  # 页码从0开始
# 假设你想替换所有出现的某个文本
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this")  # 要替换的矩形列表

for rect in hit:
    page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
       align=fitz.TEXT_ALIGN_CENTER, ...)  # 更多参数

page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE)  # 不触及图像
doc.save("replaced.pdf", garbage=3, deflate=True)

这适用于短文本和中等质量要求。

通过一些额外的努力,可以识别原始字体属性、颜色、字体大小等,以产生接近完美的结果。

英文:

Using the redaction facility of PyMuPDF is probably the adequate thing to do.
The approach:

  1. Identify the location of the text to replace
  2. Erase the text and replace it using redactions

Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.

import fitz  # import PyMuPDF

doc = fitz.open("myfile.pdf")
page = doc[number]  # page number 0-based
# suppose you want to replace all occurrences of some text
disliked = "delete this"
better   = "better text"
hits = page.search_for("delete this")  # list of rectangles where to replace

for rect in hit:
    page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
       align=fitz.TEXT_ALIGN_CENTER, ...)  # more parameters

page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE)  # don't touch images
doc.save("replaced.pdf", garbage=3, deflate=True)

This works well with short text and medium quality expectations.

With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.

huangapple
  • 本文由 发表于 2023年2月8日 22:36:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75387339.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定