2023年5月22日 15:48:25go评论73阅读模式

英文:

How can I convert image coordinates to PDF coordinates when using pdf2image and table-transformers?

问题

I am using pdf2image to convert pdf to images and detecting tables with table-transformers. I need help with coordinates.

Issue is, I am getting perfect table borders but pixels in images are different from PDF coordinates. Any way to convert image coordinates to PDF coordinates?
Here is my code for reference:

from pdf2image import convert_from_path

images = convert_from_path('/content/Sample Statement Format Bancslink.pdf')

for i in range(len(images)):
  images[i].save('/content/pages_sbi/page'+str(i)+'.jpeg')

英文:

I am using pdf2image to convert pdf to images and detecting tables with table-transformers. I need help with coordinates.

Issue is, I am getting perfect table borders but pixels in images are different from PDF coordinates. Any way to convert image coordinates to PDF coordinates?
Here is my code for reference:

from pdf2image import convert_from_path

images = convert_from_path(&#39;/content/Sample Statement Format Bancslink.pdf&#39;)

for i in range(len(images)):
  images[i].save(&#39;/content/pages_sbi/page&#39;+str(i)+&#39;.jpeg&#39;)

答案1

得分: 1

以下是使用 PyMuPDF 将图像坐标转换回 PDF 页坐标的示例代码：

import fitz  # PyMuPDF 导入

doc = fitz.open("input.pdf")
page = doc[pno]  # 页码 pno 是基于 0 的
image = f"image{pno}.jpg"  # 与该页对应的图像的文件名

# 矩形，例如包围图像中的表的矩形
# x0, y0 是其左上角点的坐标
# x1, y1 是右下角点的坐标
rect = fitz.Rect(x0, y0, x1, y1)

# 从 JPEG 创建 PyMuPDF 图像
pix = fitz.Pixmap(image)

# 创建一个矩阵，将任何图像坐标转换为页面坐标
mat = pix.irect.torect(page.rect)

# 现在可以将每个图像坐标转换为页面坐标
# 例如，这是页面坐标中的表矩形：
pdfrect = rect * mat

# 如果你不想要 PyMuPDF 对象作为矩形，只需使用
# tuple(pdfrect) 来检索 4 个坐标

另外，PyMuPDF 还可以将页面呈现为图像。因此，如果你的表检测机制可以按页调用，你可以创建如下的循环：

使用 PyMuPDF 读取页面
将页面转换为图像。也可以在内存中进行。
将页面图像传递给表识别器，它返回表的坐标
使用表的坐标并按上述方式将其转换为页面坐标。

英文:

Here is how to use PyMuPDF to transform image coordinates back to PDF page coordinates.

This of course works page by page. So in the following, an image file is assumed to be made from the corresponding page.

import fitz  # PyMuPDF import

doc = fitz.open(&quot;input.pdf&quot;)
page = doc[pno]  # page number pno is 0-based
image = f&quot;image{pno}.jpg&quot;  # filename of the matching image of the page

# rectangle, e.g. one that wraps a table in the image
# x0, y0 are coordinates of its top-left point
# x1, y1 is the bottom-right point
rect = fitz.Rect(x0, y0, x1, y1)

# make a PyMuPDF iamge from the JPEG
pix = fitz.Pixmap(image)

# make a matrix that converts any image coordinates to page coordinates
mat = pix.irect.torect(page.rect)

# now every image coordinate can be converted to page coordinates
# e.g. this is the table rect in page coordinates:
pdfrect = rect * mat

# if you don&#39;t want PyMuPDF objects as rectangle, just use
# tuple(pdfrect) to retrieve the 4 coordinates

Just as an aside, PyMuPDF is also able to render pages to images. So if your table detection mechanism can be invoke page, by page, you could make a loop like this:

Read page using PyMuPDF
Convert page to an image. Could be in memory, too.
Pass page image to table recognizer, which returns table coordinates
Use table coordinates and convert them to page coordinates as shown above.

答案2

得分: 0

Alright, found perfect solution which will work on almost all problems.<br>
Consider this as your code for PDF to Image:

from pdf2image import convert_from_path

images = convert_from_path('PATH')

!mkdir pages

for i in range(len(images)):
  images[i].save('/content/pages/page'+str(i)+'.jpeg')

Now, you need to get data of PDF first:

from pypdf import PdfReader

reader = PdfReader('PATH')
box = reader.pages[0].mediabox

pdf_width = box.width
pdf_height = box.height

Now read and get data about image:

import cv2
im = cv2.imread('/content/pages/page0.jpeg')
height, width, channels = im.shape

Now consider x_1, x_2, y_1, and y_2 as coordinates in the image. To get the location of the same in PDF, use the following code:

x_1 = x_1/width*pdf_width
y_1 = y_1/width*pdf_width
x_2 = x_2/width*pdf_width
y_2 = y_2/width*pdf_width

Use these coordinates for your work.

英文:

Alright, found perfect solution which will work on almost all problems.<br>
Consider this as your code for PDF to Image:

from pdf2image import convert_from_path

images = convert_from_path(&#39;PATH&#39;)

!mkdir pages

for i in range(len(images)):
  images[i].save(&#39;/content/pages/page&#39;+str(i)+&#39;.jpeg&#39;)

Now, you need to get data of PDF first:

from pypdf import PdfReader

reader = PdfReader(&#39;PATH&#39;)
box = reader.pages[0].mediabox

pdf_width = box.width
pdf_height = box.height

Now read and get data about image:

import cv2
im = cv2.imread(&#39;/content/pages/page0.jpeg&#39;)
height, width, channels = im.shape

Now consider x_1, x_2, y_1 and y_2 as coordinates in image. To get location of same in PDF, use following code:

x_1  = x_1/width*pdf_width
y_1  = y_1/width*pdf_width
x_2  = x_2/width*pdf_width
y_2  = y_2/width*pdf_width

Use this coordinates for your work.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在使用pdf2image和table-transformers时将图像坐标转换为PDF坐标？

问题

答案1

答案2

Tkinter – 限制Tab键顺序到焦点框架

如何使Pythonanywhere的网络应用与我的新文件配合工作？

在Django中实现帖子筛选页面

数据帧每行根据行中的值高效地选择列中的值。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论