英文:
One Processor For Multiple Documents
问题
可以为不同的文档使用一个自定义文档提取器吗?它会为每个文档提供训练和测试数据,但这会影响处理器的整体效率吗?是否建议为每个文档创建一个处理器?
英文:
Is it possible to use one Custom Document Extractor for different documents? It would be provided training and testing data for each document, but would this affect the overall efficiency of the processor? Is it recommended to create a processor per document?
答案1
得分: 0
建议为您想要处理的每种文档类型创建单独的自定义文档提取器。如果它们具有相似的结构,您可以尝试使用同一处理器来处理多种类型的文档,但通常情况下,处理器经过训练的训练数据越具体,提取质量就会更好。
您还可以创建自定义文档分类器来对未知的文档类型进行分类,然后使用该输出将每个文档发送到相应的提取器处理器。
英文:
It is recommended to create a separate Custom Document Extractor for each document type you want to process. You can try to use the same processor for multiple types if they have similar structures, but generally the extraction quality will work better the more specific the training data the processor is trained on.
You can also create a Custom Document Classifier to classify unknown document types, then use that output to send each document to the appropriate extractor processor.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论