问题

我是新手数据科学家，被要求从机器面板图像中提取序列号和型号号码，例如这个（其中一个比较清晰的图像）：

我已经成功进行了光学字符识别（OCR）处理，但我在思考是否确定特定的文本项是否为型号或序列号会受益于机器学习，还是使用正则表达式匹配更好。

如果这是机器学习可以处理的事情，我应该在哪里找到教程，以帮助我进行指导？

英文:

I'm new to Data Science and have been tasked with extracting the serial and model number from images of machine faceplates, such as this (one of the cleaner images):

I have managed to OCR the text, but am wondering if determining if a particular text item is a model or serial number is something that would benefit from machine learning or if doing regular expression matching would be better.

If this is something machine learning can handle, where can I find tutorials that can help guide me along?

答案1

得分: 1

这个特定的例子看起来足够简单，只需要在OCR提取的文本上使用简单的正则表达式就可以了。通常情况下，试图在这种情况下无处不在地使用“机器学习”并不是一个好主意，这将是一种过度的做法。

话虽如此，如果图像更加“复杂”（例如图像中有更多的文本或数字...），你可能希望使用更高级的方法来解决这个问题，比如命名实体识别或通过训练目标检测/分割模型来进行计算机视觉方法。

英文:

This particular example looks simple enough for just a simple regular expression on the OCR extracted text. It is generally not a good idea to try to use "Machine Learning" everywhere, in this case it would be an overkill.

That being said, if the images were more "complicated" (e.g. more text or numbers in the image...) you might want solve this using more advanced methods such as Named Entity Recognition or with a computer vision approach by training an Object detection/segmentation model.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从图像中提取序列号和型号编号

问题

答案1

使用GammaRegressor()进行拟合，并获取比例和形状参数。

如何删除Flask中的变量会话。

使用与训练数据相同的标量对象来缩放测试数据的含义

PyTorch CrossEntropyLoss文档示例崩溃

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论