英文:
How to extract text from image after applying contour in python?
问题
以下是您要翻译的内容:
"So I have applied contouring on a big image and reached the following cropped part of the image:
But now without using any machine learning model, how do I actually get the image to a text variable? I came to know about template matching but I do not understand how do I proceed from here. I do have images of letters and numbers (named according to their image value) stored in a directory, but how do I match each of them and get the text as a string? I don't want to use any ML model or library like pyTesseract.
I would appreciate any help.
Edit:
The code I have tried for template matching.
def templateMatch(image):
path = "location"
for image_path in os.listdir(path + "/characters-images"):
template = cv2.imread(os.path.join(path, "characters-images", image_path))
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
template = template.astype(np.uint8)
image = image.astype(np.uint8)
res = cv2.matchTemplate(template, image, cv2.TM_SQDIFF_NORMED)
mn, _, mnLoc, _ = cv2.minMaxLoc(res)
if res is not None:
return image_path.replace(".bmp", "")
def match(image):
plate = ""
# mask = np.zeros(image.shape, dtype=np.uint8)
# print(image.shape)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# print(image.shape)
# print(image)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="left-to-right")
for con in cnts:
area = cv2.contourArea(con)
if 800 > area > 200:
x, y, w, h = cv2.boundingRect(con)
# cv2.drawContours(mask, [c], 1, (255, 0, 0), 2)
temp = thresh[y:y+h, x:x+w]
character = templateMatching(temp)
if character is not None:
plate += character
return plate
请注意,代码中的路径和函数名称未翻译,因为您要求只翻译代码部分。
英文:
So I have applied contouring on a big image and reached the following cropped part of the image:
But now without using any machine learning model, how do I actually get the image to a text variable? I came to know about template matching but I do not understand how do I proceed from here. I do have images of letters and numbers (named according to their image value) stored in a directory, but how do I match each of them and get the text as a string? I don't want to use any ML model or library like pyTesseract.
I would appreciate any help.
Edit:
The code I have tried for template matching.
def templateMatch(image):
path = "location"
for image_path in os.listdir(path + "/characters-images"):
template = cv2.imread(os.path.join(path, "characters-images", image_path))
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
template = template.astype(np.uint8)
image = image.astype(np.uint8)
res = cv2.matchTemplate(template, image, cv2.TM_SQDIFF_NORMED)
mn, _, mnLoc, _ = cv2.minMaxLoc(res)
if res is not None:
return image_path.replace(".bmp", "")
def match(image):
plate = ""
# mask = np.zeros(image.shape, dtype=np.uint8)
# print(image.shape)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# print(image.shape)
# print(image)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="left-to-right")
for con in cnts:
area = cv2.contourArea(con)
if 800 > area > 200:
x, y, w, h = cv2.boundingRect(con)
# cv2.drawContours(mask, [c], 1, (255, 0, 0), 2)
temp = thresh[y:y+h, x:x+w]
character = templateMatching(temp)
if character is not None:
plate += character
return plate
答案1
得分: 3
如何将图像实际存储到文本变量中?我了解了模板匹配,但我不明白接下来该怎么做。
模板匹配用于在给定模板的情况下定位图像中的对象,而不是从图像中提取文本。将模板与图像中对象的位置进行匹配将无法获取文本作为字符串。有关如何应用动态比例变化模板匹配的示例,请参考如何隔离轮廓内的所有内容、缩放它并测试与图像的相似性?和Python OpenCV线检测以检测图像中的X符号。我不明白为什么不想使用OCR库。如果要将文本提取为字符串变量,您应该使用某种深度/机器学习技术。PyTesseract可能是最简单的方法。这里是使用PyTesseract的解决方案:
思路是使用Otsu的阈值获取二进制图像,然后执行轮廓面积和长宽比过滤以提取字母/数字ROI。然后,我们使用Numpy切片将每个ROI裁剪到空白掩码上,然后使用Pytesseract进行OCR。以下是每个步骤的可视化:
二进制图像
检测到的ROI在绿色中突出显示
在准备进行OCR的空白掩码上隔离的ROI
我们使用--psm 6
配置选项告诉Pytesseract假设一块统一的文本。查看这里以获取更多配置选项。Pytesseract的结果:
XS NB 23
代码
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, create mask, grayscale, Otsu's threshold
image = cv2.imread('1.png')
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Filter for ROI using contour area and aspect ratio
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.05 * peri, True)
x,y,w,h = cv2.boundingRect(approx)
aspect_ratio = w / float(h)
if area > 2000 and aspect_ratio > .5:
mask[y:y+h, x:x+w] = image[y:y+h, x:x+w]
# Perform OCR with Pytesseract
data = pytesseract.image_to_string(mask, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('mask', mask)
cv2.waitKey()
<details>
<summary>英文:</summary>
> How do I actually get the image to a text variable? I came to know about template matching but I do not understand how do I proceed from here.
Template matching is used to locate a object in an image given a template, ***not*** to extract text from an image. Matching a template with the position of the object in the image will not help to get the text as a string. For examples on how to apply dynamic scale variant template matching, take a look at [how to isolate everything inside of a contour, scale it, and test the similarity to an image?](https://stackoverflow.com/questions/59401389/how-to-isolate-everything-inside-of-a-contour-scale-it-and-test-the-similarity/59402625) and [Python OpenCV line detection to detect X symbol in image](https://stackoverflow.com/questions/58837175/python-opencv-line-detection-to-detect-x-symbol-in-image). I don't understand why would wouldn't want to use an OCR library. If you want to extract text from the image as a string variable, you should use some type of deep/machine learning. PyTesseract is probably the easiest. Here's a solution using PyTesseract
---
The idea is to obtain a binary image using Otsu's threshold then perform contour area and aspect ratio filtering to extract the letter/number ROIs. From here we use Numpy slicing to crop each ROI onto a blank mask then apply OCR using Pytesseract. Here's a visualization of each step:
Binary image
<img src="https://i.stack.imgur.com/aZ2l3.png" width="450">
Detected ROIs highlighted in green
<img src="https://i.stack.imgur.com/8BQbA.png" width="450">
Isolated ROIs on a blank mask ready for OCR
<img src="https://i.stack.imgur.com/0DXOS.png" width="450">
We use the `--psm 6` configuration option to tell Pytesseract to assume a uniform block of text. Look [here for more configuration options](https://stackoverflow.com/questions/44619077/pytesseract-ocr-multiple-config-options). Result from Pytesseract:
> XS NB 23
Code
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
Load image, create mask, grayscale, Otsu's threshold
image = cv2.imread('1.png')
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)1
Filter for ROI using contour area and aspect ratio
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts1
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.05 * peri, True)
x,y,w,h = cv2.boundingRect(approx)
aspect_ratio = w / float(h)
if area > 2000 and aspect_ratio > .5:
mask[y:y+h, x:x+w] = image[y:y+h, x:x+w]
Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(mask, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('mask', mask)
cv2.waitKey()
[1]: https://i.stack.imgur.com/aZ2l3.png
[2]: https://i.stack.imgur.com/8BQbA.png
[3]: https://i.stack.imgur.com/0DXOS.png
</details>
# 答案2
**得分**: 0
一个选择是考虑字符周围的边界框,并计算手头的字符与训练集中的字符之间的相关性分数。您将保留最大的相关性分数。(如果您在二进制图像上工作,可以使用SAD、SSD、标准化灰度相关性或仅使用汉明距离)。您需要制定一个合适的策略,以确保被测试的字符和已学习的字符具有兼容的大小并正确叠加在一起。
<details>
<summary>英文:</summary>
An option is to consider the bounding box around the characters and to compute the correlation score between a character at hand and those in the training set. You will keep the largest correlation score. (One of SAD, SSD, normalized grayscale correlation or just Hamming distance if your work on a binary image).
You will need to develop a suitable strategy to ensure that the tested characters and the learnt characters have compatible sizes and are properly overlaid.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论