英文:
Japanese OCR for GCP Document AI custom processor
问题
GCP Document AI自定义处理器的培训,似乎根本不识别日文文本。是否有启用日语语言支持的选项?
英文:
I am training the GCP Document AI custom processor for my project. It seems the processor does not recognize Japanese text at all. Is there an option to enable Japanese language support?
答案1
得分: 2
目前在自定义文档提取器中,不支持ja: Japanese
语言,如果您希望实现对Custom Document Extractor的日语语言支持功能,可以在问题跟踪器上打开一个新的功能请求,详细描述您的需求。有关自定义处理器的更多信息,您可以参考此文档。
英文:
Currently in Custom Document Extractor, ja: Japanese
language is not supported.
If you want the feature of Japanese language support for Custom Document Extractor to be implemented, you can open a new feature request on the issue tracker describing your requirement.
For more information regarding custom processor you can refer to this documentation.
答案2
得分: 2
此评论 是准确的。当前自定义文档提取器不支持日语,但已计划在2023年上半年产品路线中加入此功能。目前有一个可行的解决方法,可以在该功能实施之前使用。
注意:这不是永久解决方案,但可以在一段时间内增加Document AI Workbench的语言能力。
- 预处理您的培训文档,使用支持日语的文档OCR处理器。
- 保存输出的
ProcessResponse
JSON文件,然后移除HumanReviewStatus
并解包Document
对象。- (即JSON应以
uri: ""
开头)。
- (即JSON应以
- 将您创建的
Document
JSON文件导入Document AI Workbench数据集并标记文档。- 注意:模式标签只能用英语定义。
- 在预测期间,使用文档OCR处理器预处理您的文档,然后将输出发送到自定义文档提取器以进行预测。
- 注意:这仅适用于在线处理,不适用于批处理。
英文:
This comment is accurate. Custom Document Extractor currently doesn't support Japanese, but it is on the product roadmap for H1 2023. There is a workaround that could work for you until the feature is implemented.
Note: This is not intended to be a permanent solution, but it can increase language capabilities for Document AI Workbench for the time being.
- Pre-process your documents for training with the Document OCR processor which supports Japanese.
- Save the output
ProcessResponse
JSON files, then remove theHumanReviewStatus
and unwrap theDocument
object.- (i.e. the JSON should start with
uri: ""
).
- (i.e. the JSON should start with
- Import the
Document
JSON files you have created into a Document AI Workbench Dataset and label the documents.- Note: Schema Labels can only be defined in English.
- During prediction, pre-process your documents with the Document OCR Processor then send the output into the the Custom Document Extractor for prediction.
- Note: This only works for online processing, not batch processing
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论