英文:
How can I convert image coordinates to PDF coordinates when using pdf2image and table-transformers?
问题
I am using pdf2image to convert pdf to images and detecting tables with table-transformers. I need help with coordinates.
Issue is, I am getting perfect table borders but pixels in images are different from PDF coordinates. Any way to convert image coordinates to PDF coordinates?
Here is my code for reference:
from pdf2image import convert_from_path
images = convert_from_path('/content/Sample Statement Format Bancslink.pdf')
for i in range(len(images)):
images[i].save('/content/pages_sbi/page'+str(i)+'.jpeg')
英文:
I am using pdf2image to convert pdf to images and detecting tables with table-transformers. I need help with coordinates.
Issue is, I am getting perfect table borders but pixels in images are different from PDF coordinates. Any way to convert image coordinates to PDF coordinates?
Here is my code for reference:
from pdf2image import convert_from_path
images = convert_from_path('/content/Sample Statement Format Bancslink.pdf')
for i in range(len(images)):
images[i].save('/content/pages_sbi/page'+str(i)+'.jpeg')
答案1
得分: 1
以下是使用 PyMuPDF 将图像坐标转换回 PDF 页坐标的示例代码:
import fitz # PyMuPDF 导入
doc = fitz.open("input.pdf")
page = doc[pno] # 页码 pno 是基于 0 的
image = f"image{pno}.jpg" # 与该页对应的图像的文件名
# 矩形,例如包围图像中的表的矩形
# x0, y0 是其左上角点的坐标
# x1, y1 是右下角点的坐标
rect = fitz.Rect(x0, y0, x1, y1)
# 从 JPEG 创建 PyMuPDF 图像
pix = fitz.Pixmap(image)
# 创建一个矩阵,将任何图像坐标转换为页面坐标
mat = pix.irect.torect(page.rect)
# 现在可以将每个图像坐标转换为页面坐标
# 例如,这是页面坐标中的表矩形:
pdfrect = rect * mat
# 如果你不想要 PyMuPDF 对象作为矩形,只需使用
# tuple(pdfrect) 来检索 4 个坐标
另外,PyMuPDF 还可以将页面呈现为图像。因此,如果你的表检测机制可以按页调用,你可以创建如下的循环:
- 使用 PyMuPDF 读取页面
- 将页面转换为图像。也可以在内存中进行。
- 将页面图像传递给表识别器,它返回表的坐标
- 使用表的坐标并按上述方式将其转换为页面坐标。
英文:
Here is how to use PyMuPDF to transform image coordinates back to PDF page coordinates.
This of course works page by page. So in the following, an image file is assumed to be made from the corresponding page.
import fitz # PyMuPDF import
doc = fitz.open("input.pdf")
page = doc[pno] # page number pno is 0-based
image = f"image{pno}.jpg" # filename of the matching image of the page
# rectangle, e.g. one that wraps a table in the image
# x0, y0 are coordinates of its top-left point
# x1, y1 is the bottom-right point
rect = fitz.Rect(x0, y0, x1, y1)
# make a PyMuPDF iamge from the JPEG
pix = fitz.Pixmap(image)
# make a matrix that converts any image coordinates to page coordinates
mat = pix.irect.torect(page.rect)
# now every image coordinate can be converted to page coordinates
# e.g. this is the table rect in page coordinates:
pdfrect = rect * mat
# if you don't want PyMuPDF objects as rectangle, just use
# tuple(pdfrect) to retrieve the 4 coordinates
Just as an aside, PyMuPDF is also able to render pages to images. So if your table detection mechanism can be invoke page, by page, you could make a loop like this:
- Read page using PyMuPDF
- Convert page to an image. Could be in memory, too.
- Pass page image to table recognizer, which returns table coordinates
- Use table coordinates and convert them to page coordinates as shown above.
答案2
得分: 0
Alright, found perfect solution which will work on almost all problems.<br>
Consider this as your code for PDF to Image:
from pdf2image import convert_from_path
images = convert_from_path('PATH')
!mkdir pages
for i in range(len(images)):
images[i].save('/content/pages/page'+str(i)+'.jpeg')
Now, you need to get data of PDF first:
from pypdf import PdfReader
reader = PdfReader('PATH')
box = reader.pages[0].mediabox
pdf_width = box.width
pdf_height = box.height
Now read and get data about image:
import cv2
im = cv2.imread('/content/pages/page0.jpeg')
height, width, channels = im.shape
Now consider x_1, x_2, y_1, and y_2 as coordinates in the image. To get the location of the same in PDF, use the following code:
x_1 = x_1/width*pdf_width
y_1 = y_1/width*pdf_width
x_2 = x_2/width*pdf_width
y_2 = y_2/width*pdf_width
Use these coordinates for your work.
英文:
Alright, found perfect solution which will work on almost all problems.<br>
Consider this as your code for PDF to Image:
from pdf2image import convert_from_path
images = convert_from_path('PATH')
!mkdir pages
for i in range(len(images)):
images[i].save('/content/pages/page'+str(i)+'.jpeg')
Now, you need to get data of PDF first:
from pypdf import PdfReader
reader = PdfReader('PATH')
box = reader.pages[0].mediabox
pdf_width = box.width
pdf_height = box.height
Now read and get data about image:
import cv2
im = cv2.imread('/content/pages/page0.jpeg')
height, width, channels = im.shape
Now consider x_1, x_2, y_1 and y_2 as coordinates in image. To get location of same in PDF, use following code:
x_1 = x_1/width*pdf_width
y_1 = y_1/width*pdf_width
x_2 = x_2/width*pdf_width
y_2 = y_2/width*pdf_width
Use this coordinates for your work.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论