在训练和测试文档AI项目时,什么因素影响F1分数?

huangapple go评论66阅读模式
英文:

When training and testing a Document AI project, what influences the f1score?

问题

使用云控制台我训练了一个模型,只使用了一个字段(以避免UI错误导致训练完全停止)和一组数据。模型在50张训练图像和50张测试图像上的F1分数为0.306。

我添加了150张训练图像,这些图像主要是自动标记的,大多数在识别位置方面相当正确,但在准确的文本转换方面有些失误。

我部署了模型,得分为0.17。

  1. 我目前正在审查自动标记的标签并确认或调整它们(这将提高得分到0.357,因此这似乎是正确的步骤)。更正文本翻译是否值得?我了解“人在回路”步骤可能会向系统提供反馈,但这些字段不会导出回OCR?

  2. 我打算扩大测试集。如果我更正OCR值,那么测试分数会受到影响吗?它会发送回系统以更新未来的翻译吗?

  3. 在此产品中,被识别的框的大小和形状是否属于F分数的一部分?如果是这样,选择进行轻微调整的文本是否会提供与AI当前正在寻找的最佳匹配?我的早期框很多都是通过“添加边界框”来创建的,并且设计成适应手写预期的可能空间(例如,包括捕获文本周围的空白区域)。

谢谢

英文:

Using the cloud console I trained a model using only one field (to avoid the UI bug that was stopping training altogether) on one set of data. The model f1-scored 0.306 on 50 training images and 50 test images.

I added 150 training images, which were predominantly auto-labelled, most fairly correctly in terms of identifying the location but hit and miss on accurate text conversion.

I deployed the model and it scored at 0.17.

  1. I am currently reviewing the auto-trained labels and confirming them or adjusting them (this improved the score to 0.357 so it seems the right step). Is it worthwhile to correct the text translation as well? I understand that the "Human in the Loop" step would potentially provide feedback to the system, but that these fields are not exported back to the OCR?

  2. I intend to also increase the testing set. Is it correct that if I correct the OCR value, it will be used in the testing score? Will it be sent back to the system for updating future translations?

  3. Is the size and shape of the box that is identified part of the f-score in this product? If so, would select text with minor tweaks provide the best match to what the AI already is looking for? Many of my early boxes were by "Add Bounding Box" and were designed to fit the possible space that handwriting is expected (e.g. include the whitespace around the captured text).

Thank you

答案1

得分: 1

1 中的文档评估处理器性能 定义了 f1 分数如下:

  • F1 分数: 精确度和召回率的调和平均值,将精确度和召回率合并为一个单一指标,对两者给予相等的权重。定义为 2 * (精确度 * 召回率) / (精确度 + 召回率)

关于您关于人机协作的问题的注释,经人工审查的校正值不会自动导入到训练/测试数据集中,它们需要从 HITL 输出存储桶 导入到处理器的数据集中。

英文:

The documentation for Evaluate the performance of processors defines the f1 score as:

  • F1 score: the harmonic mean of precision and recall, which combines
    precision and recall into a single metric, providing equal weight to both.
    Defined as 2 * (Precision * Recall) / (Precision + Recall)

As a note for your questions about Human-in-the-Loop, the corrected values from human review are not automatically imported into the training/test datasets, they will need to be imported into the processor's dataset from the HITL output bucket

huangapple
  • 本文由 发表于 2023年6月2日 03:22:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385072.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定