问题

我正在进行我的毕业项目，所以我正在开发一个网站，用户可以在上面阅读PDF。我正在添加一些功能，比如将货币转换为他们国家的货币。我正在使用Flask和PyMuPDF来完成我的项目，但我不知道如何修改PDF中的文本。有谁可以帮助我解决这个问题？

我听说可以使用PyMuPDF或PyPDF来实现，但我没有找到替换文本的解决方案。

英文:

I am working on my final year project, so I working on a website where a user can come and read PDF. I am adding some features such as converting currency to their country currency. I am using flask and pymuPDF for my project and I don't know how I can modify the text at a pdf
anyone can help me with this problem?

I heard here that using pymuPDF or pypdf can work, but I didn't find any solution for replacing text.

答案1

得分: 3

使用PyMuPDF的编辑功能可能是合适的做法。
方法如下：

标识要替换的文本位置
擦除文本并使用编辑替换它

必须小心获取原始字体以及新文本是否比原始文本长/短。

import fitz  # 导入PyMuPDF

doc = fitz.open("myfile.pdf")
page = doc[number]  # 页码从0开始
# 假设你想替换所有出现的某个文本
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this")  # 要替换的矩形列表

for rect in hit:
    page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
       align=fitz.TEXT_ALIGN_CENTER, ...)  # 更多参数

page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE)  # 不触及图像
doc.save("replaced.pdf", garbage=3, deflate=True)

这适用于短文本和中等质量要求。

通过一些额外的努力，可以识别原始字体属性、颜色、字体大小等，以产生接近完美的结果。

英文:

Using the redaction facility of PyMuPDF is probably the adequate thing to do.
The approach:

Identify the location of the text to replace
Erase the text and replace it using redactions

Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.

import fitz  # import PyMuPDF

doc = fitz.open(&quot;myfile.pdf&quot;)
page = doc[number]  # page number 0-based
# suppose you want to replace all occurrences of some text
disliked = &quot;delete this&quot;
better   = &quot;better text&quot;
hits = page.search_for(&quot;delete this&quot;)  # list of rectangles where to replace

for rect in hit:
    page.add_redact_annot(rect, better, fontname=&quot;helv&quot;, fontsize=11,
       align=fitz.TEXT_ALIGN_CENTER, ...)  # more parameters

page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE)  # don&#39;t touch images
doc.save(&quot;replaced.pdf&quot;, garbage=3, deflate=True)

This works well with short text and medium quality expectations.

With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何编辑/修改/替换现有 PDF 文件中的文本？

问题

答案1

使用矩阵值计算期望值

使用Python中的文件操作删除文本

在Jinja2中渲染时如何转义双引号？

你可以如何重命名我的PyPI项目以修复其大小写问题？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论