英文:
How can I edit/modify/replace text in an existing PDF file?
问题
我正在进行我的毕业项目,所以我正在开发一个网站,用户可以在上面阅读PDF。我正在添加一些功能,比如将货币转换为他们国家的货币。我正在使用Flask和PyMuPDF来完成我的项目,但我不知道如何修改PDF中的文本。有谁可以帮助我解决这个问题?
我听说可以使用PyMuPDF或PyPDF来实现,但我没有找到替换文本的解决方案。
英文:
I am working on my final year project, so I working on a website where a user can come and read PDF. I am adding some features such as converting currency to their country currency. I am using flask and pymuPDF for my project and I don't know how I can modify the text at a pdf
anyone can help me with this problem?
I heard here that using pymuPDF or pypdf can work, but I didn't find any solution for replacing text.
答案1
得分: 3
使用PyMuPDF的编辑功能可能是合适的做法。
方法如下:
- 标识要替换的文本位置
- 擦除文本并使用编辑替换它
必须小心获取原始字体以及新文本是否比原始文本长/短。
import fitz # 导入PyMuPDF
doc = fitz.open("myfile.pdf")
page = doc[number] # 页码从0开始
# 假设你想替换所有出现的某个文本
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this") # 要替换的矩形列表
for rect in hit:
page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
align=fitz.TEXT_ALIGN_CENTER, ...) # 更多参数
page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE) # 不触及图像
doc.save("replaced.pdf", garbage=3, deflate=True)
这适用于短文本和中等质量要求。
通过一些额外的努力,可以识别原始字体属性、颜色、字体大小等,以产生接近完美的结果。
英文:
Using the redaction facility of PyMuPDF is probably the adequate thing to do.
The approach:
- Identify the location of the text to replace
- Erase the text and replace it using redactions
Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.
import fitz # import PyMuPDF
doc = fitz.open("myfile.pdf")
page = doc[number] # page number 0-based
# suppose you want to replace all occurrences of some text
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this") # list of rectangles where to replace
for rect in hit:
page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
align=fitz.TEXT_ALIGN_CENTER, ...) # more parameters
page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE) # don't touch images
doc.save("replaced.pdf", garbage=3, deflate=True)
This works well with short text and medium quality expectations.
With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论