2023年6月9日 02:56:13go评论85阅读模式

英文:

Document AI - Converting the normalized_vertices to the orginal scale of the document

问题

我正在使用Google Cloud - Document AI服务。我已经自定义了一些用于“表单数据提取”的处理器，使用了“自定义实体提取器”来处理PDF文档。
我已经对数据集进行了注释，并完成了模型的训练。
现在我可以使用Python SDK访问处理器，发送输入请求并获取响应。

在解析响应时，在部分：“result.document.entities[0].page_anchor.page_refs[0].bounding_poly.normalized_vertices”中，我获得了标准化的坐标值，它在0-1的尺度上表示给定PDF页上实体/值的位置。

标准化坐标值的示例如下：

[x: 0.30874478816986084
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.36359813809394836
x: 0.30874478816986084
y: 0.36359813809394836]

在页面尺寸对象下：“result.document.pages[0]”对象中，我获得了页面的像素尺度值。示例对象响应如下：

dimension {
  width: 1681.0
  height: 2379.0
  unit: &quot;pixels&quot;
}

我的期望：

现在我的期望是通过放大标准化坐标来获取实体的位置，并裁剪PDF页面的那一部分，将其转换为图像使用“pdf2image”模块。

我在这里使用“cv2”模块进行图像处理。

英文:

I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents.
I annotated the dataset and I completed training my model.
Now i am able to access the processor using the Python SDK to send input requests and am able to fetch responses.

While parsing the response, under the section: result.document.entities[0].page_anchor.page_refs[0].bounding_poly.normalized_vertices where i get normalized co-ordinate values, that is on a scale from 0-1, which represents the location of the Entity/Value on a given page on PDF.

A sample example of the values are as below:

[x: 0.30874478816986084
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.36359813809394836
x: 0.30874478816986084
y: 0.36359813809394836]

Under the Page dimensions object: result.document.pages[0] object i get the pixel scale values of the page. Example object response looks like:

dimension {
  width: 1681.0
  height: 2379.0
  unit: &quot;pixels&quot;
}

My Expecations:

Now my expectation is to fetch the positions of the entities, by scaling up the normalized co-ordinates. and crop that part of the PDF page, which is converted as Image using pdf2image module.

I am using cv2 module for image processing here.

答案1

得分: 1

1 中的 Python Document AI Toolbox SDK 具有从实体边界框中导出图像的功能。目前，它设置为仅导出检测到的图像（例如驾驶执照上的个人照片），但相同的代码应该适用于导出带有文本的实体图像。

https://github.com/googleapis/python-documentai-toolbox/blob/c1843812d988b4a9877b66176be8d103b55b112a/google/cloud/documentai_toolbox/wrappers/entity.py#LL66C5-L90C64

类似这样的代码应该适用于您：

from io import BytesIO
from PIL import Image

page_ref = documentai_entity.page_anchor.page_refs[0]
doc_page = documentai_document.pages[page_ref.page]
image_content = doc_page.image.content

doc_image = Image.open(BytesIO(image_content))
w, h = doc_image.size
vertices = [
  (int(v.x * w + 0.5), int(v.y * h + 0.5)) for v in page_ref.bounding_poly.normalized_vertices
]
(top, left), (bottom, right) = vertices[0], vertices[2]
entity_image = doc_image.crop((top, left, bottom, right))

英文:

The Document AI Toolbox SDK for Python has functionality to export images from an Entity bounding box. Currently, it's set to only export detected images (such as a profile photo from a drivers license) but the same code should work to export an image of an entity with text.

https://github.com/googleapis/python-documentai-toolbox/blob/c1843812d988b4a9877b66176be8d103b55b112a/google/cloud/documentai_toolbox/wrappers/entity.py#LL66C5-L90C64

Something like this should work for you

from io import BytesIO
from PIL import Image

page_ref = documentai_entity.page_anchor.page_refs[0]
doc_page = documentai_document.pages[page_ref.page]
image_content = doc_page.image.content

doc_image = Image.open(BytesIO(image_content))
w, h = doc_image.size
vertices = [
  (int(v.x * w + 0.5), int(v.y * h + 0.5)) for v in page_ref.bounding_poly.normalized_vertices
]
(top, left), (bottom, right) = vertices[0], vertices[2]
entity_image = doc_image.crop((top, left, bottom, right))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Document AI – 将normalized_vertices转换为文档的原始比例

问题

答案1

你可以在哪里找到spacy.py文件以重命名。

使用Google我的企业回复评论。

将类型为list[]的列转换为字符串在polars中

Python Pandas DataFrame Merge on Columns with Overwrite

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论