英文:
Extracting serial and model number from an image
问题
我是新手数据科学家,被要求从机器面板图像中提取序列号和型号号码,例如这个(其中一个比较清晰的图像):
我已经成功进行了光学字符识别(OCR)处理,但我在思考是否确定特定的文本项是否为型号或序列号会受益于机器学习,还是使用正则表达式匹配更好。
如果这是机器学习可以处理的事情,我应该在哪里找到教程,以帮助我进行指导?
英文:
I'm new to Data Science and have been tasked with extracting the serial and model number from images of machine faceplates, such as this (one of the cleaner images):
I have managed to OCR the text, but am wondering if determining if a particular text item is a model or serial number is something that would benefit from machine learning or if doing regular expression matching would be better.
If this is something machine learning can handle, where can I find tutorials that can help guide me along?
答案1
得分: 1
这个特定的例子看起来足够简单,只需要在OCR提取的文本上使用简单的正则表达式就可以了。通常情况下,试图在这种情况下无处不在地使用“机器学习”并不是一个好主意,这将是一种过度的做法。
话虽如此,如果图像更加“复杂”(例如图像中有更多的文本或数字...),你可能希望使用更高级的方法来解决这个问题,比如命名实体识别或通过训练目标检测/分割模型来进行计算机视觉方法。
英文:
This particular example looks simple enough for just a simple regular expression on the OCR extracted text. It is generally not a good idea to try to use "Machine Learning" everywhere, in this case it would be an overkill.
That being said, if the images were more "complicated" (e.g. more text or numbers in the image...) you might want solve this using more advanced methods such as Named Entity Recognition or with a computer vision approach by training an Object detection/segmentation model.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论