问题

GCP Document AI自定义处理器的培训，似乎根本不识别日文文本。是否有启用日语语言支持的选项？

英文:

I am training the GCP Document AI custom processor for my project. It seems the processor does not recognize Japanese text at all. Is there an option to enable Japanese language support?

答案1

得分: 2

目前在自定义文档提取器中，不支持ja: Japanese语言，如果您希望实现对Custom Document Extractor的日语语言支持功能，可以在问题跟踪器上打开一个新的功能请求，详细描述您的需求。有关自定义处理器的更多信息，您可以参考此文档。

英文:

Currently in Custom Document Extractor, ja: Japanese language is not supported.

If you want the feature of Japanese language support for Custom Document Extractor to be implemented, you can open a new feature request on the issue tracker describing your requirement.

For more information regarding custom processor you can refer to this documentation.

答案2

得分: 2

此评论是准确的。当前自定义文档提取器不支持日语，但已计划在2023年上半年产品路线中加入此功能。目前有一个可行的解决方法，可以在该功能实施之前使用。

注意：这不是永久解决方案，但可以在一段时间内增加Document AI Workbench的语言能力。

预处理您的培训文档，使用支持日语的文档OCR处理器。
保存输出的ProcessResponse JSON文件，然后移除HumanReviewStatus并解包Document对象。
- （即JSON应以uri: ""开头）。
将您创建的Document JSON文件导入Document AI Workbench数据集并标记文档。
- 注意：模式标签只能用英语定义。
在预测期间，使用文档OCR处理器预处理您的文档，然后将输出发送到自定义文档提取器以进行预测。
- 注意：这仅适用于在线处理，不适用于批处理。

英文:

This comment is accurate. Custom Document Extractor currently doesn't support Japanese, but it is on the product roadmap for H1 2023. There is a workaround that could work for you until the feature is implemented.

Note: This is not intended to be a permanent solution, but it can increase language capabilities for Document AI Workbench for the time being.

Pre-process your documents for training with the Document OCR processor which supports Japanese.
Save the output ProcessResponse JSON files, then remove the HumanReviewStatus and unwrap the Document object.
- (i.e. the JSON should start with uri: "").
Import the Document JSON files you have created into a Document AI Workbench Dataset and label the documents.
- Note: Schema Labels can only be defined in English.
During prediction, pre-process your documents with the Document OCR Processor then send the output into the the Custom Document Extractor for prediction.
- Note: This only works for online processing, not batch processing

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Japanese OCR for GCP Document AI custom processor

问题

答案1

答案2

Document AI 银行对账单处理器

正则表达式匹配包含日语和英语字符混合的字符串

使用两个自定义处理器来进行文档AI处理效率高吗？

为文档 AI OCR API 请求设置 processOptions

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论