如何在使用pdf2image和table-transformers时将图像坐标转换为PDF坐标?

huangapple go评论67阅读模式
英文:

How can I convert image coordinates to PDF coordinates when using pdf2image and table-transformers?

问题

I am using pdf2image to convert pdf to images and detecting tables with table-transformers. I need help with coordinates.

Issue is, I am getting perfect table borders but pixels in images are different from PDF coordinates. Any way to convert image coordinates to PDF coordinates?
Here is my code for reference:

from pdf2image import convert_from_path

images = convert_from_path('/content/Sample Statement Format Bancslink.pdf')

for i in range(len(images)):
  images[i].save('/content/pages_sbi/page'+str(i)+'.jpeg')
英文:

I am using pdf2image to convert pdf to images and detecting tables with table-transformers. I need help with coordinates.

Issue is, I am getting perfect table borders but pixels in images are different from PDF coordinates. Any way to convert image coordinates to PDF coordinates?
Here is my code for reference:

from pdf2image import convert_from_path

images = convert_from_path('/content/Sample Statement Format Bancslink.pdf')

for i in range(len(images)):
  images[i].save('/content/pages_sbi/page'+str(i)+'.jpeg')

答案1

得分: 1

以下是使用 PyMuPDF 将图像坐标转换回 PDF 页坐标的示例代码:

import fitz  # PyMuPDF 导入

doc = fitz.open("input.pdf")
page = doc[pno]  # 页码 pno 是基于 0 的
image = f"image{pno}.jpg"  # 与该页对应的图像的文件名

# 矩形,例如包围图像中的表的矩形
# x0, y0 是其左上角点的坐标
# x1, y1 是右下角点的坐标
rect = fitz.Rect(x0, y0, x1, y1)

# 从 JPEG 创建 PyMuPDF 图像
pix = fitz.Pixmap(image)

# 创建一个矩阵,将任何图像坐标转换为页面坐标
mat = pix.irect.torect(page.rect)

# 现在可以将每个图像坐标转换为页面坐标
# 例如,这是页面坐标中的表矩形:
pdfrect = rect * mat

# 如果你不想要 PyMuPDF 对象作为矩形,只需使用
# tuple(pdfrect) 来检索 4 个坐标

另外,PyMuPDF 还可以将页面呈现为图像。因此,如果你的表检测机制可以按页调用,你可以创建如下的循环:

  1. 使用 PyMuPDF 读取页面
  2. 将页面转换为图像。也可以在内存中进行。
  3. 将页面图像传递给表识别器,它返回表的坐标
  4. 使用表的坐标并按上述方式将其转换为页面坐标。
英文:

Here is how to use PyMuPDF to transform image coordinates back to PDF page coordinates.

This of course works page by page. So in the following, an image file is assumed to be made from the corresponding page.

import fitz  # PyMuPDF import

doc = fitz.open("input.pdf")
page = doc[pno]  # page number pno is 0-based
image = f"image{pno}.jpg"  # filename of the matching image of the page

# rectangle, e.g. one that wraps a table in the image
# x0, y0 are coordinates of its top-left point
# x1, y1 is the bottom-right point
rect = fitz.Rect(x0, y0, x1, y1)

# make a PyMuPDF iamge from the JPEG
pix = fitz.Pixmap(image)

# make a matrix that converts any image coordinates to page coordinates
mat = pix.irect.torect(page.rect)

# now every image coordinate can be converted to page coordinates
# e.g. this is the table rect in page coordinates:
pdfrect = rect * mat

# if you don't want PyMuPDF objects as rectangle, just use
# tuple(pdfrect) to retrieve the 4 coordinates

Just as an aside, PyMuPDF is also able to render pages to images. So if your table detection mechanism can be invoke page, by page, you could make a loop like this:

  1. Read page using PyMuPDF
  2. Convert page to an image. Could be in memory, too.
  3. Pass page image to table recognizer, which returns table coordinates
  4. Use table coordinates and convert them to page coordinates as shown above.

答案2

得分: 0

Alright, found perfect solution which will work on almost all problems.<br>
Consider this as your code for PDF to Image:

from pdf2image import convert_from_path

images = convert_from_path('PATH')

!mkdir pages

for i in range(len(images)):
  images[i].save('/content/pages/page'+str(i)+'.jpeg')

Now, you need to get data of PDF first:

from pypdf import PdfReader

reader = PdfReader('PATH')
box = reader.pages[0].mediabox

pdf_width = box.width
pdf_height = box.height

Now read and get data about image:

import cv2
im = cv2.imread('/content/pages/page0.jpeg')
height, width, channels = im.shape 

Now consider x_1, x_2, y_1, and y_2 as coordinates in the image. To get the location of the same in PDF, use the following code:

x_1 = x_1/width*pdf_width
y_1 = y_1/width*pdf_width
x_2 = x_2/width*pdf_width
y_2 = y_2/width*pdf_width

Use these coordinates for your work.

英文:

Alright, found perfect solution which will work on almost all problems.<br>
Consider this as your code for PDF to Image:

from pdf2image import convert_from_path

images = convert_from_path(&#39;PATH&#39;)

!mkdir pages

for i in range(len(images)):
  images[i].save(&#39;/content/pages/page&#39;+str(i)+&#39;.jpeg&#39;)

Now, you need to get data of PDF first:

from pypdf import PdfReader

reader = PdfReader(&#39;PATH&#39;)
box = reader.pages[0].mediabox

pdf_width = box.width
pdf_height = box.height

Now read and get data about image:

import cv2
im = cv2.imread(&#39;/content/pages/page0.jpeg&#39;)
height, width, channels = im.shape 

Now consider x_1, x_2, y_1 and y_2 as coordinates in image. To get location of same in PDF, use following code:

x_1  = x_1/width*pdf_width
y_1  = y_1/width*pdf_width
x_2  = x_2/width*pdf_width
y_2  = y_2/width*pdf_width

Use this coordinates for your work.

huangapple
  • 本文由 发表于 2023年5月22日 15:48:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76304011.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定