2023年3月1日 11:32:24go评论59阅读模式

英文:

Using pypdf to fix orientation problems within pdf document

问题

我正在尝试在PDF页面的坐标（x，y）处添加下拉注释。

我希望将这些注释添加到的PDF来自扫描文档的PDF。由于某种原因，我使用的扫描仪生成了一个具有以下内容的PDF页面：

{'/Type': '/Page', '/Parent': IndirectObject(1, 0, 2500474218768), '/Rotate': 270, '/Resources': IndirectObject(6, 0, 2500474218768), '/MediaBox': [0.0, 0.0, 792, 612], '/CropBox': [0.0, 0.0, 792, 612], '/Contents': [IndirectObject(7, 0, 2500474218768)]}

这个页面具有：

/Rotate = 270
/MediaBox = [0.0, 0.0, 792, 612]

因此，尽管媒体框为横向，但打印出的PDF是纵向的。

要在（x，y）处放置注释，我需要协调坐标系统。

我不想改变（x，y）为横向（不确定效果如何），而是想将扫描的PDF的方向更改为某个标准方向，以便我可以在（x，y）处正确放置注释。

如何使用PyPDF来实现这一点？

我尝试过设置page.rotate(-270)和page.mediabox.upper_right(page.mediabox.top,page.mediabox.right)，但在这些转换后，PDF不再正确打印。我不够了解这些事情，不知道如何正确操作。

英文:

I am trying to put a drop-down annotation into a pdf page at coordinates (x,y).

The pdf I wish to add these annotations to is a pdf that comes from a scanned document. For some reason the scanner I am using produces a pdf page with the following content:

{&#39;/Type&#39;: &#39;/Page&#39;, &#39;/Parent&#39;: IndirectObject(1, 0, 2500474218768), &#39;/Rotate&#39;: 270, &#39;/Resources&#39;: IndirectObject(6, 0, 2500474218768), &#39;/MediaBox&#39;: [0.0, 0.0, 792, 612], &#39;/CropBox&#39;: [0.0, 0.0, 792, 612], &#39;/Contents&#39;: [IndirectObject(7, 0, 2500474218768)]}

This page has

/Rotate = 270 
/MediaBox = [0.0, 0.0, 792, 612]

So the printed pdf is in portrait, despite the landscape mediabox.

To place the annotation at (x,y), I need the coordinate systems to agree.

Instead of changing (x,y) to landscape (not sure how well that would work), I would like to change the orientation of the pdf that was scanned to some standard orientation that would allow me to place the annotation correctly at (x,y).

How can I use pypdf to do this?

I have tried setting page.rotate(-270) and page.mediabox.upper_right(page.mediabox.top,page.mediabox.right)

However the pdf no longer prints correctly after those transformations. I do not understand things well enough to know how to do this correctly.

答案1

得分: 2

非常感谢您对cpdf的评论。最终，我没有使用cpdf，而是找到了使用pypdf的解决方案。

我想分享这个解决方案，以防有人可能会觉得有用。

很可能您已经足够高级，可以意识到文档进入扫描仪的方向。但如果您像我一样，可能会将文档以横向方式放入扫描仪，生成的PDF具有上面所描述的不良特性。只需在将文档放入扫描仪之前手动将它们旋转90度，无需编码。
如果您想更正PDF中的不良旋转，而这些旋转正好与我上面描述的一样，可以使用以下pypdf命令。

import pypdf
...
reader = pypdf.PdfReader(pdf_file)
writer = pypdf.PdfWriter()
for page in reader.pages:
    page.transfer_rotation_to_content()
    writer.add_page(page)
...

如果这不起作用，而您正在试图理解PDF时，像我一样大声呼吸，那么PDF是什么这是唯一的参考资料，可以帮助我理解如何使用3x3变换矩阵来执行所需的变换。一旦理解，它就变得有意义，但在理解之前不会。

英文:

Thank you very much for your comments on cpdf. Ultimately I did not use cpdf and found a solution using pypdf.

I would like to share this solution in case anyone might find it useful.

Most likely you are advanced enough at life to realize the direction documents go into your scanner. However, if you're like me, you might be putting your documents into your scanner in landscape position for which the pdf that is generated has the undesired qualities above. Just rotate your documents by hand 90 degrees before you place them into your scanner, no code required.
In the case you want to correct the undesired rotation in your pdf and those rotations are exactly like those I described above, use the following pypdf command.

import pypdf
...
reader = pypdf.PdfReader(pdf_file)
writer = pypdf.PdfWriter()
for page in reader.pages:
    page.transfer_rotation_to_content()
    writer.add_page(page)
...

If that doesn't work, and you're mouth breathing as loudly as I was trying to figure out, well, what a pdf is this was the only reference I found to help me understand how the 3x3 transformation matrix is used to do the transformations you want. It makes sense once you make sense of it, but not a moment before.

<sub>Note: Answer provided by OP on question section.</sub>

答案2

得分: 0

这是一个令人惊讶地复杂的过程。您需要将旋转设置为零，交换x和y页面大小坐标，然后对页面内容进行逆旋转以进行补偿。然后还有一些复杂情况，例如需要移动的现有注释。如果PyPDF没有这样的功能，自己编写可能会相当困难。

尝试使用cpdf -upright in.pdf -o out.pdf（或来自https://github.com/coherentgraphics/python-libcpdf 的upright函数）对您的文件进行预处理，然后您可以像通常一样使用PyPDF。

英文:

This is a surprisingly complex process. You would need to set the rotation to zero, swap the x and y page size coordinates around, and then counter-rotate the page content to compensate. There are then complications like existing annotations which would need moving too. If PyPDF doesn't have a function to do this, it would be quite hard to write yourself.

Try preprocessing your files with cpdf -upright in.pdf -o out.pdf (or the upright function from https://github.com/coherentgraphics/python-libcpdf ) and then you can use PyPDF as normal.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用pypdf来修复PDF文档中的方向问题。

问题

答案1

答案2

iText7使用GlobalSign DSS AATL证书进行PDF签名，在Adobe Reader中显示破损的信任链。

如何从PNG创建一个可搜索的OCR’d PDF，但将JPEG用作图片。

PDFGeneration in VB.Net using iText: Trailer not found (iText.Kernel.PdfException)

How can I modify my XSL-FO stylesheet to efficiently place multiple invoices in the same PDF page using Apache FOP in A4 size page

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论