英文:
Preforming OCR on Seven Segment Text with Microsoft's Computer Vision?
问题
我一直在使用Microsoft的计算机视觉OCR来从各种类型的图像中提取文本,但似乎在七段字体方面遇到了问题。
它有时可以识别它们,但大多数情况下都会出错。
我已经查找了一些替代方法,但宁愿继续使用我们已经拥有的服务。有什么建议吗?
英文:
I've been using Microsoft's Computer Vision OCR to extract text from various types of images - but have seem to hit a bump in the road with Seven Segment font.
It sometimes can pick up on them, but it mostly gets it wrong.
I've looked around and found some alternative methods, but would rather continue using the service we already have. Any suggestions?
答案1
得分: 1
以下是翻译的内容:
问题
经过一个月的研究和实验,我打算在这里分享我的发现和解决方案,以防其他人遇到相同或类似的问题。
我需要一种可靠的方法来提取多种类型的制冷展示器的温度。其中一些展示器使用了微软的计算机视觉没有问题的标准字体,而其他展示器使用了七段字体。
由于光学字符识别(OCR)的性质,七段字体不受直接支持。为了克服这个问题,您需要应用一些图像处理技术,将分段文本连接在一起,然后将其传递给OCR。
解决方案概述
- 创建一个自定义视觉对象检测模型,以从图像中提取显示器。
- 开发一个自定义视觉分类模型,以确定显示器的类型。
- 根据分类,将图像传递给Tesseract和专用于数字文本的模型,或者传递给计算机视觉处理标准文本。
- 从Tesseract的输出中应用正则表达式(Regex)来提取所需的温度。
解决方案分解
首先,我们将图像传递给我们的对象检测模型。
输入:
原始图像
对象检测输出:
对象检测输出
然后,我们将该图像传递给分类模型以确定显示器类型。
分类输出: 分类结果
接下来,我们执行一系列图像处理技术,包括:
- 高斯模糊并转换为灰度:
模糊和灰度 - RGB阈值以提取文本:RGB阈值
- 腐蚀以连接分段文本:腐蚀
- 膨胀以减少突出的像素数量:膨胀
- 文档偏斜(通过AForge.Imaging)来将图像旋转到文本的方向:文档偏斜
由于此显示器被分类为“分段”,因此它随后会被传递给Tesseract,并使用“LetsGoDigital”模型进行分析,该模型专用于数字字体。
Tesseract输出: "rawText": "- 16.-9,,,6\n\f"
经过一些正则表达式处理后,我们得到: "value": "-16.96"
诚然,这个过程并没有提供最佳的结果,但足以继续前进。通过优化模板、输入图像、自定义视觉模型和OCR过程,我们可以期望在未来看到更好的结果。
如果微软的计算机视觉能够原生支持七段字体,那将是令人惊讶的,因为当前的解决方案在某种程度上感觉有些不够正式。考虑到我们的应用程序的性质,我更愿意继续使用计算机视觉,而不是Tesseract或任何其他OCR方法。
英文:
After a month of research and experimentation, I'm going to share my findings and solutions here in case anyone else encounters the same or a similar problem.
The Problem
I needed a reliable way to extract the temperature from multiple types of Refrigeration Displays. Some of these displays used a standard font that Microsoft's Computer Vision had no trouble with, while others used a Seven-Segmented font.
Due to the nature of Optical Character Recognition (OCR), Seven-Segmented font is not supported directly. To overcome this, you need to apply some image processing techniques to join the segmented text before passing it into the OCR.
Solution Overview
- Create a Custom Vision Object Detection Model to extract the display from the image.
- Develop a Custom Vision Classification Model to determine the type of display.
- Depending on the classification, pass the image either to Tesseract along with a model specialized for digital text, or to Computer Vision when dealing with standard text.
- Apply regular expressions (Regex) to the output from Tesseract to extract the desired temperature.
Solution Breakdown
First, we pass the image into our Object Detection Model.
Input:
Original Image
Object Detection Output:
Object Detection Output
Then we pass that image into the Classification Model to determine the display type.
Classification Output: Classification Result
Next, we perform a series of image processing techniques, including:
- Gaussian Blur and convert to grayscale:
Blur & Graysacle - RGB Threshold to pull out the text: RGB Threshold
- Erosion to connect the segmented text: Erosion
- Dilation to reduce the amount of extruding pixels: Dilation
- Document Skew (via AForge.Imaging) & to rotate the image to the orientation of the text: Document Skew
Since this display is classified as 'Segmented,' it then gets passed into Tesseract and analyzed using the 'LetsGoDigital' model, which is specialized for digital fonts.
Tesseract Output: "rawText": "- 16.-9,,,6\n\f"
After some Regex, we're left with: "value": "-16.96"
Admittedly, this process isn't providing the best results, but it's sufficient to move forward. By refining the template, input images, Custom Vision Models, and the OCR process, we can expect to see better results in the future.
It would be amazing to see Seven Segment Font natively supported by Microsoft's Computer Vision, as the current solution feels somewhat hacky. I'd prefer to continue using Computer Vision instead of Tesseract or any other OCR method, considering the nature of our application.
答案2
得分: 0
也许你需要增强图像或对其进行预处理,以便OCR能够检测到文字。
因此,我使用了以下代码来增强亮度并检测文字。
from PIL import Image, ImageEnhance
def convert_to_ela_image(path, quality):
filename = path
resaved_filename = 'tempresaved.jpg'
im = Image.open(filename).convert('RGB')
im.save(resaved_filename, 'JPEG', quality=quality)
resaved_im = Image.open(resaved_filename)
ela_im = ImageEnhance.Brightness(resaved_im).enhance(0.3)
ela_im.save("./image/result.jpg", 'JPEG')
return ela_im
convert_to_ela_image(<image_path>, 80)
在这里,你需要更改ImageEnhance.Brightness(resaved_im).enhance(0.3)
中的增强参数以适应不同的图像。我已经给出了0.3。这会生成如下的修改后图像。
预测。
pip install azure-ai-vision
代码:
import os
import azure.ai.vision as sdk
service_options = sdk.VisionServiceOptions("endpoint", "key")
vision_source = sdk.VisionSource(filename=r"./image/result.jpg")
analysis_options = sdk.ImageAnalysisOptions()
analysis_options.features = (
sdk.ImageAnalysisFeature.CAPTION |
sdk.ImageAnalysisFeature.TEXT
)
analysis_options.language = "en"
analysis_options.gender_neutral_caption = True
image_analyzer = sdk.ImageAnalyzer(service_options, vision_source, analysis_options)
result = image_analyzer.analyze()
if result.reason == sdk.ImageAnalysisResultReason.ANALYZED:
if result.caption is not None:
print(" Caption:")
print(" '{}', Confidence {:.4f}".format(result.caption.content, result.caption.confidence))
if result.text is not None:
print(" Text:")
for line in result.text.lines:
points_string = "{" + ", ".join([str(int(point)) for point in line.bounding_polygon]) + "}"
print(" Line: '{}', Bounding polygon {}".format(line.content, points_string))
for word in line.words:
points_string = "{" + ", ".join([str(int(point)) for point in word.bounding_polygon]) + "}"
print(" Word: '{}', Bounding polygon {}, Confidence {:.4f}".format(word.content, points_string, word.confidence))
else:
error_details = sdk.ImageAnalysisErrorDetails.from_result(result)
print(" Analysis failed.")
print(" Error reason: {}".format(error_details.reason))
print(" Error code: {}".format(error_details.error_code))
print(" Error message: {}".format(error_details.message))
输出:
使用保存的图像result.jpg在门户中。
类似地,你需要根据正确的预测更改图像亮度。
再次,以下是我得到错误输出的图像。
因此,我通过提供增强值0.4和0.3来修改它
对于0.4,输出是
对于0.3
对于0.4,它给出了正确的输出,对于你的输入数据,选择0.3。因此,基于你的输入数据,你可以预处理图像并选择增强因子。
英文:
Maybe you need enhance the image or pre-process it so that ocr will detect.
so, I used below code for enhance the brightness and check for text recognition.
from PIL import Image, ImageEnhance
def convert_to_ela_image(path, quality):
filename = path
resaved_filename = 'tempresaved.jpg'
im = Image.open(filename).convert('RGB')
im.save(resaved_filename, 'JPEG', quality = quality)
resaved_im = Image.open(resaved_filename)
ela_im = ImageEnhance.Brightness(resaved_im).enhance(0.3)
ela_im.save("./image/result.jpg",'JPEG')
return ela_im
convert_to_ela_image(<image_path>,80)
Here, you need to alter enhance argument in ImageEnhance.Brightness(resaved_im).enhance(0.3)
for different image.
I have given 0.3.
This gives altered image as below.
Predictions.
pip install azure-ai-vision
code:
import os
import azure.ai.vision as sdk
service_options = sdk.VisionServiceOptions("endpoint","key")
vision_source = sdk.VisionSource(filename=r"./image/result.jpg")
analysis_options = sdk.ImageAnalysisOptions()
analysis_options.features = (
sdk.ImageAnalysisFeature.CAPTION |
sdk.ImageAnalysisFeature.TEXT
)
analysis_options.language = "en"
analysis_options.gender_neutral_caption = True
image_analyzer = sdk.ImageAnalyzer(service_options, vision_source, analysis_options)
result = image_analyzer.analyze()
if result.reason == sdk.ImageAnalysisResultReason.ANALYZED:
if result.caption is not None:
print(" Caption:")
print(" '{}', Confidence {:.4f}".format(result.caption.content, result.caption.confidence))
if result.text is not None:
print(" Text:")
for line in result.text.lines:
points_string = "{" + ", ".join([str(int(point)) for point in line.bounding_polygon]) + "}"
print(" Line: '{}', Bounding polygon {}".format(line.content, points_string))
for word in line.words:
points_string = "{" + ", ".join([str(int(point)) for point in word.bounding_polygon]) + "}"
print(" Word: '{}', Bounding polygon {}, Confidence {:.4f}".format(word.content, points_string, word.confidence))
else:
error_details = sdk.ImageAnalysisErrorDetails.from_result(result)
print(" Analysis failed.")
print(" Error reason: {}".format(error_details.reason))
print(" Error code: {}".format(error_details.error_code))
print(" Error message: {}".format(error_details.message))
Output:
Using the saved image that is result.jpg in portal.
Similarly, you need to alter image on brightness for correct prediction.
Again, below is the image i am getting wrong output.
So, i altered it by giving enhance 0.4 and 0.3
For 0.4 the output is
For 0.3
It gave correct output for 0.4 and for your inputs it 0.3.
So based on your input data you pre-process the image and select the enhance factor.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论