2023年7月3日 15:24:09go评论85阅读模式

英文:

Preforming OCR on Seven Segment Text with Microsoft's Computer Vision?

问题

我一直在使用Microsoft的计算机视觉OCR来从各种类型的图像中提取文本，但似乎在七段字体方面遇到了问题。

使用Microsoft的计算机视觉进行七段数码文本的OCR？

它有时可以识别它们，但大多数情况下都会出错。

使用Microsoft的计算机视觉进行七段数码文本的OCR？

我已经查找了一些替代方法，但宁愿继续使用我们已经拥有的服务。有什么建议吗？

英文:

I've been using Microsoft's Computer Vision OCR to extract text from various types of images - but have seem to hit a bump in the road with Seven Segment font.

使用Microsoft的计算机视觉进行七段数码文本的OCR？

It sometimes can pick up on them, but it mostly gets it wrong.

使用Microsoft的计算机视觉进行七段数码文本的OCR？

I've looked around and found some alternative methods, but would rather continue using the service we already have. Any suggestions?

答案1

得分: 1

以下是翻译的内容：

问题

经过一个月的研究和实验，我打算在这里分享我的发现和解决方案，以防其他人遇到相同或类似的问题。

我需要一种可靠的方法来提取多种类型的制冷展示器的温度。其中一些展示器使用了微软的计算机视觉没有问题的标准字体，而其他展示器使用了七段字体。

由于光学字符识别（OCR）的性质，七段字体不受直接支持。为了克服这个问题，您需要应用一些图像处理技术，将分段文本连接在一起，然后将其传递给OCR。

解决方案概述

创建一个自定义视觉对象检测模型，以从图像中提取显示器。
开发一个自定义视觉分类模型，以确定显示器的类型。
根据分类，将图像传递给Tesseract和专用于数字文本的模型，或者传递给计算机视觉处理标准文本。
从Tesseract的输出中应用正则表达式（Regex）来提取所需的温度。

解决方案分解

首先，我们将图像传递给我们的对象检测模型。

输入：
原始图像

对象检测输出：
对象检测输出

然后，我们将该图像传递给分类模型以确定显示器类型。

分类输出： 分类结果

接下来，我们执行一系列图像处理技术，包括：

高斯模糊并转换为灰度：
模糊和灰度
RGB阈值以提取文本：RGB阈值
腐蚀以连接分段文本：腐蚀
膨胀以减少突出的像素数量：膨胀
文档偏斜（通过AForge.Imaging）来将图像旋转到文本的方向：文档偏斜

由于此显示器被分类为“分段”，因此它随后会被传递给Tesseract，并使用“LetsGoDigital”模型进行分析，该模型专用于数字字体。

Tesseract输出： "rawText": "- 16.-9,,,6\n\f"

经过一些正则表达式处理后，我们得到： "value": "-16.96"

诚然，这个过程并没有提供最佳的结果，但足以继续前进。通过优化模板、输入图像、自定义视觉模型和OCR过程，我们可以期望在未来看到更好的结果。

如果微软的计算机视觉能够原生支持七段字体，那将是令人惊讶的，因为当前的解决方案在某种程度上感觉有些不够正式。考虑到我们的应用程序的性质，我更愿意继续使用计算机视觉，而不是Tesseract或任何其他OCR方法。

英文:

After a month of research and experimentation, I'm going to share my findings and solutions here in case anyone else encounters the same or a similar problem.

The Problem

I needed a reliable way to extract the temperature from multiple types of Refrigeration Displays. Some of these displays used a standard font that Microsoft's Computer Vision had no trouble with, while others used a Seven-Segmented font.

Due to the nature of Optical Character Recognition (OCR), Seven-Segmented font is not supported directly. To overcome this, you need to apply some image processing techniques to join the segmented text before passing it into the OCR.

Solution Overview

Create a Custom Vision Object Detection Model to extract the display from the image.
Develop a Custom Vision Classification Model to determine the type of display.
Depending on the classification, pass the image either to Tesseract along with a model specialized for digital text, or to Computer Vision when dealing with standard text.
Apply regular expressions (Regex) to the output from Tesseract to extract the desired temperature.

Solution Breakdown

First, we pass the image into our Object Detection Model.

Input:
Original Image

Object Detection Output:
Object Detection Output

Then we pass that image into the Classification Model to determine the display type.

Classification Output: Classification Result

Next, we perform a series of image processing techniques, including:

Gaussian Blur and convert to grayscale:
Blur & Graysacle
RGB Threshold to pull out the text: RGB Threshold
Erosion to connect the segmented text: Erosion
Dilation to reduce the amount of extruding pixels: Dilation
Document Skew (via AForge.Imaging) & to rotate the image to the orientation of the text: Document Skew

Since this display is classified as 'Segmented,' it then gets passed into Tesseract and analyzed using the 'LetsGoDigital' model, which is specialized for digital fonts.

Tesseract Output: "rawText": "- 16.-9,,,6\n\f"

After some Regex, we're left with: "value": "-16.96"

Admittedly, this process isn't providing the best results, but it's sufficient to move forward. By refining the template, input images, Custom Vision Models, and the OCR process, we can expect to see better results in the future.

It would be amazing to see Seven Segment Font natively supported by Microsoft's Computer Vision, as the current solution feels somewhat hacky. I'd prefer to continue using Computer Vision instead of Tesseract or any other OCR method, considering the nature of our application.

答案2

得分: 0

也许你需要增强图像或对其进行预处理，以便OCR能够检测到文字。

因此，我使用了以下代码来增强亮度并检测文字。

from PIL import Image, ImageEnhance

def convert_to_ela_image(path, quality):
    filename = path
    resaved_filename = 'tempresaved.jpg'
    im = Image.open(filename).convert('RGB')
    im.save(resaved_filename, 'JPEG', quality=quality)
    resaved_im = Image.open(resaved_filename)
    ela_im = ImageEnhance.Brightness(resaved_im).enhance(0.3)
    ela_im.save("./image/result.jpg", 'JPEG')
    return ela_im

convert_to_ela_image(<image_path>, 80)

在这里，你需要更改ImageEnhance.Brightness(resaved_im).enhance(0.3)中的增强参数以适应不同的图像。我已经给出了0.3。这会生成如下的修改后图像。

使用Microsoft的计算机视觉进行七段数码文本的OCR？

预测。

pip install azure-ai-vision

代码：

import os
import azure.ai.vision as sdk

service_options = sdk.VisionServiceOptions("endpoint", "key")
vision_source = sdk.VisionSource(filename=r"./image/result.jpg")
analysis_options = sdk.ImageAnalysisOptions()
analysis_options.features = (
    sdk.ImageAnalysisFeature.CAPTION |
    sdk.ImageAnalysisFeature.TEXT
)
analysis_options.language = "en"
analysis_options.gender_neutral_caption = True
image_analyzer = sdk.ImageAnalyzer(service_options, vision_source, analysis_options)
result = image_analyzer.analyze()

if result.reason == sdk.ImageAnalysisResultReason.ANALYZED:
    if result.caption is not None:
        print(" Caption:")
        print(" '{}', Confidence {:.4f}".format(result.caption.content, result.caption.confidence))

    if result.text is not None:
        print(" Text:")

        for line in result.text.lines:
            points_string = "{" + ", ".join([str(int(point)) for point in line.bounding_polygon]) + "}"
            print(" Line: '{}', Bounding polygon {}".format(line.content, points_string))
            for word in line.words:
                points_string = "{" + ", ".join([str(int(point)) for point in word.bounding_polygon]) + "}"
                print(" Word: '{}', Bounding polygon {}, Confidence {:.4f}".format(word.content, points_string, word.confidence))
else:
    error_details = sdk.ImageAnalysisErrorDetails.from_result(result)
    print(" Analysis failed.")
    print(" Error reason: {}".format(error_details.reason))
    print(" Error code: {}".format(error_details.error_code))
    print(" Error message: {}".format(error_details.message))

输出：

使用Microsoft的计算机视觉进行七段数码文本的OCR？

使用保存的图像result.jpg在门户中。

使用Microsoft的计算机视觉进行七段数码文本的OCR？

类似地，你需要根据正确的预测更改图像亮度。

再次，以下是我得到错误输出的图像。

使用Microsoft的计算机视觉进行七段数码文本的OCR？

因此，我通过提供增强值0.4和0.3来修改它

对于0.4，输出是

使用Microsoft的计算机视觉进行七段数码文本的OCR？

对于0.3

使用Microsoft的计算机视觉进行七段数码文本的OCR？

对于0.4，它给出了正确的输出，对于你的输入数据，选择0.3。因此，基于你的输入数据，你可以预处理图像并选择增强因子。

英文:

Maybe you need enhance the image or pre-process it so that ocr will detect.

so, I used below code for enhance the brightness and check for text recognition.

from  PIL  import  Image, ImageEnhance

def  convert_to_ela_image(path, quality):
    filename  =  path
    resaved_filename  =  &#39;tempresaved.jpg&#39;
    im  =  Image.open(filename).convert(&#39;RGB&#39;)
    im.save(resaved_filename, &#39;JPEG&#39;, quality  =  quality)
    resaved_im  =  Image.open(resaved_filename)
    ela_im  =  ImageEnhance.Brightness(resaved_im).enhance(0.3)
    ela_im.save(&quot;./image/result.jpg&quot;,&#39;JPEG&#39;)
    return  ela_im
    
convert_to_ela_image(&lt;image_path&gt;,80)

Here, you need to alter enhance argument in ImageEnhance.Brightness(resaved_im).enhance(0.3) for different image.
I have given 0.3.
This gives altered image as below.

使用Microsoft的计算机视觉进行七段数码文本的OCR？

Predictions.

pip install azure-ai-vision

code:

import  os
import  azure.ai.vision  as  sdk

service_options  =  sdk.VisionServiceOptions(&quot;endpoint&quot;,&quot;key&quot;)
vision_source  =  sdk.VisionSource(filename=r&quot;./image/result.jpg&quot;)
analysis_options  =  sdk.ImageAnalysisOptions()
analysis_options.features  = (
			sdk.ImageAnalysisFeature.CAPTION |
			sdk.ImageAnalysisFeature.TEXT

)
analysis_options.language  =  &quot;en&quot;
analysis_options.gender_neutral_caption  =  True
image_analyzer  =  sdk.ImageAnalyzer(service_options, vision_source, analysis_options)
result  =  image_analyzer.analyze()

if  result.reason  ==  sdk.ImageAnalysisResultReason.ANALYZED:
    if  result.caption  is  not  None:
	    print(&quot; Caption:&quot;)
	    print(&quot; &#39;{}&#39;, Confidence {:.4f}&quot;.format(result.caption.content, result.caption.confidence))
	    
	if  result.text  is  not  None:
		print(&quot; Text:&quot;)
		
		for  line  in  result.text.lines:
			points_string  =  &quot;{&quot;  +  &quot;, &quot;.join([str(int(point)) for  point  in  line.bounding_polygon]) +  &quot;}&quot;
			print(&quot; Line: &#39;{}&#39;, Bounding polygon {}&quot;.format(line.content, points_string))
			for  word  in  line.words:
				points_string  =  &quot;{&quot;  +  &quot;, &quot;.join([str(int(point)) for  point  in  word.bounding_polygon]) +  &quot;}&quot;
				print(&quot; Word: &#39;{}&#39;, Bounding polygon {}, Confidence {:.4f}&quot;.format(word.content, points_string, word.confidence))
else:
	error_details  =  sdk.ImageAnalysisErrorDetails.from_result(result)
	print(&quot; Analysis failed.&quot;)
	print(&quot; Error reason: {}&quot;.format(error_details.reason))
	print(&quot; Error code: {}&quot;.format(error_details.error_code))
	print(&quot; Error message: {}&quot;.format(error_details.message))

Output:

使用Microsoft的计算机视觉进行七段数码文本的OCR？

Using the saved image that is result.jpg in portal.

使用Microsoft的计算机视觉进行七段数码文本的OCR？

Similarly, you need to alter image on brightness for correct prediction.

Again, below is the image i am getting wrong output.
使用Microsoft的计算机视觉进行七段数码文本的OCR？

So, i altered it by giving enhance 0.4 and 0.3

For 0.4 the output is

使用Microsoft的计算机视觉进行七段数码文本的OCR？

For 0.3

使用Microsoft的计算机视觉进行七段数码文本的OCR？

It gave correct output for 0.4 and for your inputs it 0.3.
So based on your input data you pre-process the image and select the enhance factor.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Microsoft的计算机视觉进行七段数码文本的OCR？

问题

答案1

答案2

在Terraform的main.tf文件中输入Azure存储账户变量。

第三方用户访问Azure环境

Terraform 在 Azure NSG 资源上的嵌套循环

哪种RBAC角色需要批准私有端点连接（在事件中心命名空间上）？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论