问题

使用云控制台我训练了一个模型，只使用了一个字段（以避免UI错误导致训练完全停止）和一组数据。模型在50张训练图像和50张测试图像上的F1分数为0.306。

我添加了150张训练图像，这些图像主要是自动标记的，大多数在识别位置方面相当正确，但在准确的文本转换方面有些失误。

我部署了模型，得分为0.17。

我目前正在审查自动标记的标签并确认或调整它们（这将提高得分到0.357，因此这似乎是正确的步骤）。更正文本翻译是否值得？我了解“人在回路”步骤可能会向系统提供反馈，但这些字段不会导出回OCR？
我打算扩大测试集。如果我更正OCR值，那么测试分数会受到影响吗？它会发送回系统以更新未来的翻译吗？
在此产品中，被识别的框的大小和形状是否属于F分数的一部分？如果是这样，选择进行轻微调整的文本是否会提供与AI当前正在寻找的最佳匹配？我的早期框很多都是通过“添加边界框”来创建的，并且设计成适应手写预期的可能空间（例如，包括捕获文本周围的空白区域）。

谢谢

英文:

Using the cloud console I trained a model using only one field (to avoid the UI bug that was stopping training altogether) on one set of data. The model f1-scored 0.306 on 50 training images and 50 test images.

I added 150 training images, which were predominantly auto-labelled, most fairly correctly in terms of identifying the location but hit and miss on accurate text conversion.

I deployed the model and it scored at 0.17.

I am currently reviewing the auto-trained labels and confirming them or adjusting them (this improved the score to 0.357 so it seems the right step). Is it worthwhile to correct the text translation as well? I understand that the "Human in the Loop" step would potentially provide feedback to the system, but that these fields are not exported back to the OCR?
I intend to also increase the testing set. Is it correct that if I correct the OCR value, it will be used in the testing score? Will it be sent back to the system for updating future translations?
Is the size and shape of the box that is identified part of the f-score in this product? If so, would select text with minor tweaks provide the best match to what the AI already is looking for? Many of my early boxes were by "Add Bounding Box" and were designed to fit the possible space that handwriting is expected (e.g. include the whitespace around the captured text).

Thank you

答案1

得分: 1

1 中的文档评估处理器性能定义了 f1 分数如下：

F1 分数： 精确度和召回率的调和平均值，将精确度和召回率合并为一个单一指标，对两者给予相等的权重。定义为 2 * (精确度 * 召回率) / (精确度 + 召回率)

关于您关于人机协作的问题的注释，经人工审查的校正值不会自动导入到训练/测试数据集中，它们需要从 HITL 输出存储桶导入到处理器的数据集中。

英文:

The documentation for Evaluate the performance of processors defines the f1 score as:

F1 score: the harmonic mean of precision and recall, which combines
precision and recall into a single metric, providing equal weight to both.
Defined as 2 * (Precision * Recall) / (Precision + Recall)

As a note for your questions about Human-in-the-Loop, the corrected values from human review are not automatically imported into the training/test datasets, they will need to be imported into the processor's dataset from the HITL output bucket

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在训练和测试文档AI项目时，什么因素影响F1分数？

问题

答案1

alerts/notification set up for file upload from local into Google storage bucket

How do I make an object in Google Cloud Storage accessible via a link but require a key or some other form of verification to access it?

使用Stackdriver API在Kubernetes / Google Container Engine (GKE)上进行日志记录

如何在不同区域获取 GKE 节点池（分区集群）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论