一个处理器用于多个文档

huangapple go评论43阅读模式
英文:

One Processor For Multiple Documents

问题

可以为不同的文档使用一个自定义文档提取器吗?它会为每个文档提供训练和测试数据,但这会影响处理器的整体效率吗?是否建议为每个文档创建一个处理器?

英文:

Is it possible to use one Custom Document Extractor for different documents? It would be provided training and testing data for each document, but would this affect the overall efficiency of the processor? Is it recommended to create a processor per document?

答案1

得分: 0

建议为您想要处理的每种文档类型创建单独的自定义文档提取器。如果它们具有相似的结构,您可以尝试使用同一处理器来处理多种类型的文档,但通常情况下,处理器经过训练的训练数据越具体,提取质量就会更好。

您还可以创建自定义文档分类器来对未知的文档类型进行分类,然后使用该输出将每个文档发送到相应的提取器处理器。

英文:

It is recommended to create a separate Custom Document Extractor for each document type you want to process. You can try to use the same processor for multiple types if they have similar structures, but generally the extraction quality will work better the more specific the training data the processor is trained on.

You can also create a Custom Document Classifier to classify unknown document types, then use that output to send each document to the appropriate extractor processor.

huangapple
  • 本文由 发表于 2023年7月7日 02:49:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631752.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定