问题

可以为不同的文档使用一个自定义文档提取器吗？它会为每个文档提供训练和测试数据，但这会影响处理器的整体效率吗？是否建议为每个文档创建一个处理器？

英文:

Is it possible to use one Custom Document Extractor for different documents? It would be provided training and testing data for each document, but would this affect the overall efficiency of the processor? Is it recommended to create a processor per document?

答案1

得分: 0

建议为您想要处理的每种文档类型创建单独的自定义文档提取器。如果它们具有相似的结构，您可以尝试使用同一处理器来处理多种类型的文档，但通常情况下，处理器经过训练的训练数据越具体，提取质量就会更好。

您还可以创建自定义文档分类器来对未知的文档类型进行分类，然后使用该输出将每个文档发送到相应的提取器处理器。

英文:

It is recommended to create a separate Custom Document Extractor for each document type you want to process. You can try to use the same processor for multiple types if they have similar structures, but generally the extraction quality will work better the more specific the training data the processor is trained on.

You can also create a Custom Document Classifier to classify unknown document types, then use that output to send each document to the appropriate extractor processor.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

一个处理器用于多个文档

问题

答案1

批处理作业提交错误 “无法处理所有文档”，URI 似乎正确？

在训练和测试文档AI项目时，什么因素影响F1分数？

Document AI – 将normalized_vertices转换为文档的原始比例

使用两个自定义处理器来进行文档AI处理效率高吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论