英文:
How to extract different sections of a pdf with Document Ai
问题
我想要能够显示PDF文件的不同部分的列表,就像上显示的那样。我是通过Flutter Web通过REST API调用处理器的。
我尝试使用fieldMask
从API响应中获取实体,但对于图片中的文档,我什么都没有得到,不确定应该使用哪些字段来获得所需的响应。
英文:
I want to be able to show a list of different sections of the pdf file like what is shown on the. I'm calling the processor through REST api via Flutter Web.
I tried getiing the entities from the api response using fieldMask
but got nothing for the document in the picture, not sure what fields should be used to get the desired response.
答案1
得分: 2
1 文档 OCR 处理器以 Document
JSON 格式返回文本和布局信息。UI 中突出显示的每个部分都是 Block
或 Paragraph
,您需要解析 JSON 响应以获取每个部分的数据,包括边界框。
您可以参考文档中的 处理响应 > 文本、布局和质量分数 部分,了解输出的结构以及解析它的代码示例。
您还可以参考这些开源示例 Web 应用程序,展示了与您所要求的类似用例:
-
https://github.com/GoogleCloudPlatform/document-ai-samples/tree/main/web-app-demo
-
https://github.com/GoogleCloudPlatform/document-ai-samples/tree/main/web-app-pix2info-python
英文:
The Document OCR Processor returns text and layout information in the Document
JSON format. Each of those sections highlighted in the UI is a Block
or a Paragraph
, you will need to parse the JSON response to get the data for each section including the bounding boxes.
You can refer to Handle the processing response > Text, layout, and quality scores in the documentation for explanations of how the output is structured and code samples for parsing it.
You can also refer to these open source sample web applications that show use cases similar to what you are asking:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论