英文:
Document AI - Converting the normalized_vertices to the orginal scale of the document
问题
我正在使用Google Cloud - Document AI服务。我已经自定义了一些用于“表单数据提取”的处理器,使用了“自定义实体提取器”来处理PDF文档。
我已经对数据集进行了注释,并完成了模型的训练。
现在我可以使用Python SDK访问处理器,发送输入请求并获取响应。
在解析响应时,在部分:“result.document.entities[0].page_anchor.page_refs[0].bounding_poly.normalized_vertices”中,我获得了标准化的坐标值,它在0-1的尺度上表示给定PDF页上实体/值的位置。
标准化坐标值的示例如下:
[x: 0.30874478816986084
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.36359813809394836
x: 0.30874478816986084
y: 0.36359813809394836]
在页面尺寸对象下:“result.document.pages[0]”对象中,我获得了页面的像素尺度值。示例对象响应如下:
dimension {
width: 1681.0
height: 2379.0
unit: "pixels"
}
我的期望:
现在我的期望是通过放大标准化坐标来获取实体的位置,并裁剪PDF页面的那一部分,将其转换为图像使用“pdf2image”模块。
我在这里使用“cv2”模块进行图像处理。
英文:
I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents.
I annotated the dataset and I completed training my model.
Now i am able to access the processor using the Python SDK to send input requests and am able to fetch responses.
While parsing the response, under the section: result.document.entities[0].page_anchor.page_refs[0].bounding_poly.normalized_vertices
where i get normalized co-ordinate values, that is on a scale from 0-1, which represents the location of the Entity/Value on a given page on PDF.
A sample example of the values are as below:
[x: 0.30874478816986084
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.36359813809394836
x: 0.30874478816986084
y: 0.36359813809394836]
Under the Page dimensions object: result.document.pages[0]
object i get the pixel scale values of the page. Example object response looks like:
dimension {
width: 1681.0
height: 2379.0
unit: "pixels"
}
My Expecations:
Now my expectation is to fetch the positions of the entities, by scaling up the normalized co-ordinates. and crop that part of the PDF page, which is converted as Image using pdf2image
module.
I am using cv2
module for image processing here.
答案1
得分: 1
1 中的 Python Document AI Toolbox SDK 具有从实体边界框中导出图像的功能。目前,它设置为仅导出检测到的图像(例如驾驶执照上的个人照片),但相同的代码应该适用于导出带有文本的实体图像。
类似这样的代码应该适用于您:
from io import BytesIO
from PIL import Image
page_ref = documentai_entity.page_anchor.page_refs[0]
doc_page = documentai_document.pages[page_ref.page]
image_content = doc_page.image.content
doc_image = Image.open(BytesIO(image_content))
w, h = doc_image.size
vertices = [
(int(v.x * w + 0.5), int(v.y * h + 0.5)) for v in page_ref.bounding_poly.normalized_vertices
]
(top, left), (bottom, right) = vertices[0], vertices[2]
entity_image = doc_image.crop((top, left, bottom, right))
英文:
The Document AI Toolbox SDK for Python has functionality to export images from an Entity
bounding box. Currently, it's set to only export detected images (such as a profile photo from a drivers license) but the same code should work to export an image of an entity with text.
Something like this should work for you
from io import BytesIO
from PIL import Image
page_ref = documentai_entity.page_anchor.page_refs[0]
doc_page = documentai_document.pages[page_ref.page]
image_content = doc_page.image.content
doc_image = Image.open(BytesIO(image_content))
w, h = doc_image.size
vertices = [
(int(v.x * w + 0.5), int(v.y * h + 0.5)) for v in page_ref.bounding_poly.normalized_vertices
]
(top, left), (bottom, right) = vertices[0], vertices[2]
entity_image = doc_image.crop((top, left, bottom, right))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论