使用pypdf来修复PDF文档中的方向问题。

huangapple go评论54阅读模式
英文:

Using pypdf to fix orientation problems within pdf document

问题

我正在尝试在PDF页面的坐标(x,y)处添加下拉注释。

我希望将这些注释添加到的PDF来自扫描文档的PDF。由于某种原因,我使用的扫描仪生成了一个具有以下内容的PDF页面:

{'/Type': '/Page', '/Parent': IndirectObject(1, 0, 2500474218768), '/Rotate': 270, '/Resources': IndirectObject(6, 0, 2500474218768), '/MediaBox': [0.0, 0.0, 792, 612], '/CropBox': [0.0, 0.0, 792, 612], '/Contents': [IndirectObject(7, 0, 2500474218768)]}

这个页面具有:

/Rotate = 270
/MediaBox = [0.0, 0.0, 792, 612]

因此,尽管媒体框为横向,但打印出的PDF是纵向的。

要在(x,y)处放置注释,我需要协调坐标系统。

我不想改变(x,y)为横向(不确定效果如何),而是想将扫描的PDF的方向更改为某个标准方向,以便我可以在(x,y)处正确放置注释。

如何使用PyPDF来实现这一点?

我尝试过设置page.rotate(-270)page.mediabox.upper_right(page.mediabox.top,page.mediabox.right),但在这些转换后,PDF不再正确打印。我不够了解这些事情,不知道如何正确操作。

英文:

I am trying to put a drop-down annotation into a pdf page at coordinates (x,y).

The pdf I wish to add these annotations to is a pdf that comes from a scanned document. For some reason the scanner I am using produces a pdf page with the following content:

{'/Type': '/Page', '/Parent': IndirectObject(1, 0, 2500474218768), '/Rotate': 270, '/Resources': IndirectObject(6, 0, 2500474218768), '/MediaBox': [0.0, 0.0, 792, 612], '/CropBox': [0.0, 0.0, 792, 612], '/Contents': [IndirectObject(7, 0, 2500474218768)]}

This page has

/Rotate = 270 
/MediaBox = [0.0, 0.0, 792, 612]

So the printed pdf is in portrait, despite the landscape mediabox.

To place the annotation at (x,y), I need the coordinate systems to agree.

Instead of changing (x,y) to landscape (not sure how well that would work), I would like to change the orientation of the pdf that was scanned to some standard orientation that would allow me to place the annotation correctly at (x,y).

How can I use pypdf to do this?

I have tried setting page.rotate(-270) and page.mediabox.upper_right(page.mediabox.top,page.mediabox.right)

However the pdf no longer prints correctly after those transformations. I do not understand things well enough to know how to do this correctly.

答案1

得分: 2

非常感谢您对cpdf的评论。最终,我没有使用cpdf,而是找到了使用pypdf的解决方案。

我想分享这个解决方案,以防有人可能会觉得有用。

  1. 很可能您已经足够高级,可以意识到文档进入扫描仪的方向。但如果您像我一样,可能会将文档以横向方式放入扫描仪,生成的PDF具有上面所描述的不良特性。只需在将文档放入扫描仪之前手动将它们旋转90度,无需编码。

  2. 如果您想更正PDF中的不良旋转,而这些旋转正好与我上面描述的一样,可以使用以下pypdf命令。

import pypdf
...
reader = pypdf.PdfReader(pdf_file)
writer = pypdf.PdfWriter()
for page in reader.pages:
    page.transfer_rotation_to_content()
    writer.add_page(page)
...
  1. 如果这不起作用,而您正在试图理解PDF时,像我一样大声呼吸,那么PDF是什么这是唯一的参考资料,可以帮助我理解如何使用3x3变换矩阵来执行所需的变换。一旦理解,它就变得有意义,但在理解之前不会。
英文:

Thank you very much for your comments on cpdf. Ultimately I did not use cpdf and found a solution using pypdf.

I would like to share this solution in case anyone might find it useful.

  1. Most likely you are advanced enough at life to realize the direction documents go into your scanner. However, if you're like me, you might be putting your documents into your scanner in landscape position for which the pdf that is generated has the undesired qualities above. Just rotate your documents by hand 90 degrees before you place them into your scanner, no code required.

  2. In the case you want to correct the undesired rotation in your pdf and those rotations are exactly like those I described above, use the following pypdf command.

import pypdf
...
reader = pypdf.PdfReader(pdf_file)
writer = pypdf.PdfWriter()
for page in reader.pages:
    page.transfer_rotation_to_content()
    writer.add_page(page)
...
  1. If that doesn't work, and you're mouth breathing as loudly as I was trying to figure out, well, what a pdf is this was the only reference I found to help me understand how the 3x3 transformation matrix is used to do the transformations you want. It makes sense once you make sense of it, but not a moment before.

<sub>Note: Answer provided by OP on question section.</sub>

答案2

得分: 0

这是一个令人惊讶地复杂的过程。您需要将旋转设置为零,交换x和y页面大小坐标,然后对页面内容进行逆旋转以进行补偿。然后还有一些复杂情况,例如需要移动的现有注释。如果PyPDF没有这样的功能,自己编写可能会相当困难。

尝试使用cpdf -upright in.pdf -o out.pdf(或来自https://github.com/coherentgraphics/python-libcpdf 的upright函数)对您的文件进行预处理,然后您可以像通常一样使用PyPDF。

英文:

This is a surprisingly complex process. You would need to set the rotation to zero, swap the x and y page size coordinates around, and then counter-rotate the page content to compensate. There are then complications like existing annotations which would need moving too. If PyPDF doesn't have a function to do this, it would be quite hard to write yourself.

Try preprocessing your files with cpdf -upright in.pdf -o out.pdf (or the upright function from https://github.com/coherentgraphics/python-libcpdf ) and then you can use PyPDF as normal.

huangapple
  • 本文由 发表于 2023年3月1日 11:32:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75599342.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定