从表格图像中提取各个字段到Excel中,使用OCR技术。

huangapple go评论84阅读模式
英文:

Extract individual field from table image to excel with OCR

问题

以下是翻译好的部分:

我有扫描的图像,其中包含如下图所示的表格:

从表格图像中提取各个字段到Excel中,使用OCR技术。

我正在尝试单独提取每个方框并执行OCR,但是当我尝试检测水平和垂直线,然后检测方框时,返回以下图像:

[![输入图像描述][2]][2]

当我尝试执行其他转换以检测文本(侵蚀和膨胀)时,仍然会出现一些线条的残留,与文本一起,如下所示:

[![膨胀的文本和线条][3]][3]

我无法仅检测文本以执行OCR,而且无法生成适当的边界框,如下所示:

[![检测到的带有框的图像][4]][4]

我无法使用实际线条清晰分开的框,我在涂鸦中编辑的图像上尝试过这一点(如下所示),添加了数字,它可以工作。

[![输入图像描述][5]][5]

我不知道我做错了哪一部分,但如果有什么我应该尝试或者在我的问题中可能更改/添加的内容,请告诉我。

#加载所有所需的库
pylab inline
import cv2
import numpy as np
import pandas as pd
import pytesseract
import matplotlib.pyplot as plt
import statistics
from time import sleep
import random

img = cv2.imread('images/scan1.jpg',0)

# 为图像添加边框
img1= cv2.copyMakeBorder(img,50,50,50,50,cv2.BORDER_CONSTANT,value=[255,255])

# 阈值化图像
(thresh, th3) = cv2.threshold(img1, 255, 255,cv2.THRESH_BINARY|cv2.THRESH_OTSU)

# 翻转图像像素值
th3 = 255-th3

# 初始化用于检测表边界的内核
if(th3.shape[0]<1000):
    ver = np.array([[1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1]])
    hor = np.array([[1,1,1,1,1,1]])

else:
    ver = np.array([[1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1],
               [1]])
    hor = np.array([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]])

# 检测表边界的垂直线
img_temp1 = cv2.erode(th3, ver, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, ver, iterations=3)

# 检测表边界的水平线
img_hor = cv2.erode(th3, hor, iterations=3)
hor_lines_img = cv2.dilate(img_hor, hor, iterations=4)

# 添加水平和垂直线
hor_ver = cv2.add(hor_lines_img,verticle_lines_img)

hor_ver = 255-hor_ver

# 从图像中减去表边界
temp = cv2.subtract(th3,hor_ver)

temp = 255-temp

# 对擦除表边界进行异或操作
tt = cv2.bitwise_xor(img1,temp)

iii = cv2.bitwise_not(tt)

tt1=iii.copy()

# 内核初始化
ver1 = np.array([[1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1],
               [1,1]])
hor1 = np.array([[1,1,1,1,1,1,1,1,1,1],
               [1,1,1,1,1,1,1,1,1,1]])

# 形态学操作
temp1 = cv2.erode(tt1, ver1, iterations=2)
verticle_lines_img1 = cv2.dilate(temp1, ver1, iterations=1)

temp12 = cv2.erode(tt1, hor1, iterations=1)
hor_lines_img2 = cv2.dilate(temp12, hor1, iterations=1)

# 进行或操作以仅检测文本部分并删除其余部分
hor_ver = cv2.add(hor_lines_img2,verticle_lines_img1)
dim1 = (hor_ver.shape[1],hor_ver.shape[0])
dim = (hor_ver.shape[1]*2,hor_ver.shape[0]*2)

# 将图像调整到其两倍大小以增加文本大小
resized = cv2.resize(hor_ver, dim, interpolation = cv2.INTER_AREA)

# 位运算非操作,翻转像素值以应用侵蚀和膨胀等形态学操作
want = cv2.bitwise_not(resized)

if(want.shape[0]<1000):
    kernel1 = np.array([[1,1,1]])
    kernel2 = np.array([[1,1],
                        [1,1]])
    kernel3 = np.array([[1,0,1],[0,1,0],
                       [1,0,1]])
else:
    kernel1 = np.array([[1,1,1,1,1,1]])
    kernel2 = np.array([[1,1,1,1,1],
                        [1,1,1,1,1],
                        [1,1,1,1,1],
                        [1,1,1,1,1]])

tt1 = cv2.dilate(want,kernel1,iterations=2)

# 将图像恢复到其原始大小
resized1 = cv2.resize(tt1, dim1, interpolation = cv2.INTER_AREA)

# 查找图像的轮廓,将检测到所有框
contours1, hierarchy1 = cv2.findContours(resized1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

#按x轴对轮廓进行排序的

<details>
<summary>英文:</summary>

I have scanned images which have tables as shown in this image:

[![scanned image with handwritten digits and printed information][1]][1]


I am trying to extract each box separately and perform OCR but when I try to detect horizontal and vertical lines and then detect boxes it&#39;s returning the following image:

[![enter image description here][2]][2]

And when I try to perform other transformations to detect text (erode and dilate) some remains of lines are still coming along with text like below:

[![dilated text and lines][3]][3]

I cannot detect text only to perform OCR and proper bounding boxes aren&#39;t being generated like below:

[![Image with detected boxes][4]][4]

I cannot get clearly separated boxes using real lines, I&#39;ve tried this on an image that was edited in paint(as shown below) to add digits and it works.

[![enter image description here][5]][5]

I don&#39;t know which part I&#39;m doing wrong but if there&#39;s anything I should try or maybe change/add in my question please please tell me.


#Loading all required libraries
%pylab inline
import cv2
import numpy as np
import pandas as pd
import pytesseract
import matplotlib.pyplot as plt
import statistics
from time import sleep
import random

img = cv2.imread('images/scan1.jpg',0)

for adding border to an image

img1= cv2.copyMakeBorder(img,50,50,50,50,cv2.BORDER_CONSTANT,value=[255,255])

Thresholding the image

(thresh, th3) = cv2.threshold(img1, 255, 255,cv2.THRESH_BINARY|cv2.THRESH_OTSU)

to flip image pixel values

th3 = 255-th3

initialize kernels for table boundaries detections

if(th3.shape[0]<1000):
ver = np.array([1,
1,
1,
1,
1,
1,
1])
hor = np.array([[1,1,1,1,1,1]])

else:
ver = np.array([1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1])
hor = np.array([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]])

to detect vertical lines of table borders

img_temp1 = cv2.erode(th3, ver, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, ver, iterations=3)

to detect horizontal lines of table borders

img_hor = cv2.erode(th3, hor, iterations=3)
hor_lines_img = cv2.dilate(img_hor, hor, iterations=4)

adding horizontal and vertical lines

hor_ver = cv2.add(hor_lines_img,verticle_lines_img)

hor_ver = 255-hor_ver

subtracting table borders from image

temp = cv2.subtract(th3,hor_ver)

temp = 255-temp

#Doing xor operation for erasing table boundaries
tt = cv2.bitwise_xor(img1,temp)

iii = cv2.bitwise_not(tt)

tt1=iii.copy()

#kernel initialization
ver1 = np.array([[1,1],
[1,1],
[1,1],
[1,1],
[1,1],
[1,1],
[1,1],
[1,1],
[1,1]])
hor1 = np.array([[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1]])

#morphological operation
temp1 = cv2.erode(tt1, ver1, iterations=2)
verticle_lines_img1 = cv2.dilate(temp1, ver1, iterations=1)

temp12 = cv2.erode(tt1, hor1, iterations=1)
hor_lines_img2 = cv2.dilate(temp12, hor1, iterations=1)

doing or operation for detecting only text part and removing rest all

hor_ver = cv2.add(hor_lines_img2,verticle_lines_img1)
dim1 = (hor_ver.shape1,hor_ver.shape[0])
dim = (hor_ver.shape1*2,hor_ver.shape[0]*2)

resizing image to its double size to increase the text size

resized = cv2.resize(hor_ver, dim, interpolation = cv2.INTER_AREA)

#bitwise not operation for fliping the pixel values so as to apply morphological operation such as dilation and erode
want = cv2.bitwise_not(resized)

if(want.shape[0]<1000):
kernel1 = np.array([[1,1,1]])
kernel2 = np.array([[1,1],
[1,1]])
kernel3 = np.array([[1,0,1],[0,1,0],
[1,0,1]])
else:
kernel1 = np.array([[1,1,1,1,1,1]])
kernel2 = np.array([[1,1,1,1,1],
[1,1,1,1,1],
[1,1,1,1,1],
[1,1,1,1,1]])

tt1 = cv2.dilate(want,kernel1,iterations=2)

getting image back to its original size

resized1 = cv2.resize(tt1, dim1, interpolation = cv2.INTER_AREA)

Find contours for image, which will detect all the boxes

contours1, hierarchy1 = cv2.findContours(resized1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

#function to sort contours by its x-axis (top to bottom)
def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
reverse = False
i = 0

# handle if we need to sort in reverse
if method == &quot;right-to-left&quot; or method == &quot;bottom-to-top&quot;:
reverse = True
# handle if we are sorting against the y-coordinate rather than
# the x-coordinate of the bounding box
if method == &quot;top-to-bottom&quot; or method == &quot;bottom-to-top&quot;:
i = 1
# construct the list of bounding boxes and sort them from top to
# bottom
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
key=lambda b:b[1][i], reverse=reverse))
# return the list of sorted contours and bounding boxes
return (cnts, boundingBoxes)

#sorting contours by calling fuction
(cnts, boundingBoxes) = sort_contours(contours1, method="top-to-bottom")

#storing value of all bouding box height
heightlist=[]
for i in range(len(boundingBoxes)):
heightlist.append(boundingBoxes[i][3])

#sorting height values
heightlist.sort()

sportion = int(.5len(heightlist))
eportion = int(0.05
len(heightlist))

#taking 50% to 95% values of heights and calculate their mean
#this will neglect small bounding box which are basically noise
try:
medianheight = statistics.mean(heightlist[-sportion:-eportion])
except:
medianheight = statistics.mean(heightlist[-sportion:-2])

#keeping bounding box which are having height more then 70% of the mean height and deleting all those value where

ratio of width to height is less then 0.9

box =[]
imag = iii.copy()
for i in range(len(cnts)):
cnt = cnts[i]
x,y,w,h = cv2.boundingRect(cnt)
if(h>=.7*medianheight and w/h > 0.9):
image = cv2.rectangle(imag,(x+4,y-2),(x+w-5,y+h),(0,255,0),1)
box.append([x,y,w,h])
# to show image

###Now we have badly detected boxes image as shown

  [1]: https://i.stack.imgur.com/c0aTd.jpg
[2]: https://i.stack.imgur.com/5Y8do.jpg
[3]: https://i.stack.imgur.com/uuuwF.jpg
[4]: https://i.stack.imgur.com/jW3rf.jpg
[5]: https://i.stack.imgur.com/U1ZXg.jpg
</details>
# 答案1
**得分**: 5
以下是已翻译好的内容:
"你走在正确的轨道上。以下是对您的方法的延续,稍作修改。思路如下:
1. **获取二进制图像。** 加载图像,将其转换为灰度图像,并使用Otsu的阈值法。
2. **移除所有字符文本轮廓。** 我们创建一个矩形内核并执行开运算,只保留水平/垂直线条。这将有效地将文本转化为微小的噪音,因此我们找到轮廓并使用轮廓面积进行过滤以将它们移除。
3. **修复水平/垂直线条并提取每个ROI。** 我们使用闭运算来修复和连接线条,并平滑表格。从这里,我们使用 `imutils.sort_contours()` 来按 `top-to-bottom` 参数对方框字段轮廓进行排序。接下来,我们找到轮廓并使用轮廓面积进行过滤,然后提取每个ROI。
这里是每个方框字段和提取的ROI的可视化。
[![进入图像描述][1]][1]
代码
```python
import cv2
import numpy as np
from imutils import contours
# 加载图像,灰度化,Otsu阈值化
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# 使用形态学运算去除文本字符
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
cnts = cv2.findContours(opening, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 500:
cv2.drawContours(opening, [c], -1, (0, 0, 0), -1)
# 修复表格线条,排序轮廓,提取ROI
close = 255 - cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel, iterations=1)
cnts = cv2.findContours(close, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
for c in cnts:
area = cv2.contourArea(c)
if area < 25000:
x, y, w, h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36, 255, 12), -1)
ROI = original[y:y+h, x:x+w]
# 可视化
cv2.imshow('image', image)
cv2.imshow('ROI', ROI)
cv2.waitKey(20)
cv2.imshow('opening', opening)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.waitKey()

<details>
<summary>英文:</summary>
You&#39;re on the right track. Here&#39;s a continuation of your approach with slight modifications. The idea is:
1. **Obtain binary image.** Load image, convert to grayscale, and Otsu&#39;s threshold.
2. **Remove all character text contours.** We create a rectangular kernel and perform opening to only keep the horizontal/vertical lines. This will effectively make the text into tiny noise so we find contours and filter using contour area to remove them.
3. **Repair horizontal/vertical lines and extract each ROI.** We morph close to fix and broken lines and smooth the table. From here we sort the box field contours using `imutils.sort_contours()` with the `top-to-bottom` parameter. Next we find contours and filter using contour area then extract each ROI.
---
Here&#39;s a visualization of each box field and the extracted ROI
[![enter image description here][1]][1]
Code

import cv2
import numpy as np
from imutils import contours

Load image, grayscale, Otsu's threshold

image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)1

Remove text characters with morph open and contour filtering

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
cnts = cv2.findContours(opening, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts1
for c in cnts:
area = cv2.contourArea(c)
if area < 500:
cv2.drawContours(opening, [c], -1, (0,0,0), -1)

Repair table lines, sort contours, and extract ROI

close = 255 - cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel, iterations=1)
cnts = cv2.findContours(close, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts1
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
for c in cnts:
area = cv2.contourArea(c)
if area < 25000:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), -1)
ROI = original[y:y+h, x:x+w]

    # Visualization
cv2.imshow(&#39;image&#39;, image)
cv2.imshow(&#39;ROI&#39;, ROI)
cv2.waitKey(20)

cv2.imshow('opening', opening)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.waitKey()


[1]: https://i.stack.imgur.com/yRYX7.gif
</details>
# 答案2
**得分**: 2
nanthancy的回答也是准确的,我使用以下脚本来获取每个框并按列和行进行排序。
**注意:这段代码的大部分来自Kanan Vyas在这里的Medium博客:https://medium.com/coinmonks/a-box-detection-algorithm-for-any-image-containing-boxes-756c15d7ed26**
```python
# 代码部分...

我在Jupyter笔记本中制作了这个,如果有任何错误,请告诉我。

感谢大家的回答。

英文:

nanthancy's answer is also accurate, I used the following script for getting each box and sorting it by columns and rows.

Note: Most of this code is from a medium blog by Kanan Vyas here: https://medium.com/coinmonks/a-box-detection-algorithm-for-any-image-containing-boxes-756c15d7ed26

#most of this code is take from blog by Kanan Vyas here: 
#https://medium.com/coinmonks/a-box-detection-algorithm-for-any-image-containing-boxes-756c15d7ed26
import cv2
import numpy as np
img = cv2.imread(&#39;images/scan2.jpg&#39;,0)
#fn to show np images with cv2 and close on any key press
def imshow(img, label=&#39;default&#39;):
cv2.imshow(label, img)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Thresholding the image
(thresh, img_bin) = cv2.threshold(img, 250, 255,cv2.THRESH_BINARY|cv2.THRESH_OTSU)
#inverting the image
img_bin = 255-img_bin 
# Defining a kernel length
kernel_length = np.array(img).shape[1]//80
# A verticle kernel of (1 X kernel_length), which will detect all the verticle lines from the image.
verticle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, kernel_length))# A horizontal kernel of (kernel_length X 1), which will help to detect all the horizontal line from the image.
hori_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_length, 1))# A kernel of (3 X 3) ones.
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
# Morphological operation to detect vertical lines from an image
img_temp1 = cv2.erode(img_bin, verticle_kernel, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, verticle_kernel, iterations=3)
#cv2.imwrite(&quot;verticle_lines.jpg&quot;,verticle_lines_img)
# Morphological operation to detect horizontal lines from an image
img_temp2 = cv2.erode(img_bin, hori_kernel, iterations=3)
horizontal_lines_img = cv2.dilate(img_temp2, hori_kernel, iterations=3)
#cv2.imwrite(&quot;horizontal_lines.jpg&quot;,horizontal_lines_img)
# Weighting parameters, this will decide the quantity of an image to be added to make a new image.
alpha = 0.5
beta = 1.0 - alpha# This function helps to add two image with specific weight parameter to get a third image as summation of two image.
img_final_bin = cv2.addWeighted(verticle_lines_img, alpha, horizontal_lines_img, beta, 0.0)
img_final_bin = cv2.erode(~img_final_bin, kernel, iterations=2)
(thresh, img_final_bin) = cv2.threshold(img_final_bin, 128,255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
cv2.imwrite(&quot;img_final_bin.jpg&quot;,img_final_bin)
# Find contours for image, which will detect all the boxes
contours, hierarchy = cv2.findContours(img_final_bin, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
&quot;&quot;&quot; this section saves each extracted box as a seperate image.
idx = 0
for c in contours:
# Returns the location and width,height for every contour
x, y, w, h = cv2.boundingRect(c)
#only selecting boxes within certain width height range
if (w &gt; 10 and h &gt; 15 and h &lt; 50):
idx += 1
new_img = img[y:y+h, x:x+w]
#cv2.imwrite(&quot;kanan/1/&quot;+ &quot;{}-{}-{}-{}&quot;.format(x, y, w, h) + &#39;.jpg&#39;, new_img)
&quot;&quot;&quot;
#get set of all y-coordinates to sort boxes row wise
def getsety(boxes):
ally = []
for b in boxes:
ally.append(b[1])
ally = set(ally)
ally = sorted(ally)
return ally
#sort boxes by y in certain range, because if image is tilted than same row boxes 
#could have different Ys but within certain range
def sort_boxes(boxes, y, row_column):
l = []
for b in boxes:
if (b[2] &gt; 10 and b[3] &gt; 15 and b[3] &lt; 50):
if b[1] &gt;= y - 7 and b[1] &lt;= y + 7:
l.append(b)
if l in row_column:
return row_column
else:
row_column.append(l)
return row_column
#sort each row using X of each box to sort it column wise
def sortrows(rc):
new_rc = []
for row in rc:
r_new = sorted(row, key = lambda cell: cell[0])
new_rc.append(r_new)
return new_rc
row_column = []
for i in getsety(boundingBoxes):
row_column = sort_boxes(boundingBoxes, i, row_column)
row_column = [i for i in row_column if i != []]
#final np array with sorted boxes from top left to bottom right
row_column = sortrows(row_column)

I made this in Jupyter notebook and copy-pasted here, if any errors come up, let me know.

Thank you everyone for answers

答案3

得分: 0

这是一个使用 tesseract-ocr 进行布局检测的函数。您可以尝试不同的 RIL 级别和 PSM。更多详细信息请查看这里:https://github.com/sirfz/tesserocr

import os
import platform
from typing import List, Tuple

from tesserocr import PyTessBaseAPI, iterate_level, RIL

system = platform.system()
if system == 'Linux':
    tessdata_folder_default = ''
elif system == 'Windows':
    tessdata_folder_default = r'C:\Program Files (x86)\Tesseract-OCR\tessdata'
else:
    raise NotImplementedError

# this tesseract specific env variable takes precedence for tessdata folder location selection
# especially important for windows, as we don't know if we're running 32 or 64bit tesseract
tessdata_folder = os.getenv('TESSDATA_PREFIX', tessdata_folder_default)


def get_layout_boxes(input_image,  # PIL image object
                     level: RIL,
                     include_text: bool,
                     include_boxes: bool,
                     language: str,
                     psm: int,
                     tessdata_path='') -> List[Tuple]:
    """
    Get image components coordinates. It will return also text if include_text is True.
    :param input_image: input PIL image
    :param level: page iterator level, please see "RIL" enum
    :param include_text: if True return boxes texts
    :param include_boxes: if True return boxes coordinates
    :param language: language for OCR
    :param psm: page segmentation mode, by default it is PSM.AUTO which is 3
    :param tessdata_path: the path to the tessdata folder
    :return: list of tuples: [((x1, y1, x2, y2), text)), ...]
    """
    assert any((include_text, include_boxes)), (
        'Both include_text and include_boxes can not be False.')

    if not tessdata_path:
        tessdata_path = tessdata_folder

    try:
        with PyTessBaseAPI(path=tessdata_path, lang=language) as api:
            api.SetImage(input_image)

            api.SetPageSegMode(psm)
            api.Recognize()
            page_iterator = api.GetIterator()
            data = []
            for pi in iterate_level(page_iterator, level):
                bounding_box = pi.BoundingBox(level)
                if bounding_box is not None:
                    text = pi.GetUTF8Text(level) if include_text else None
                    box = bounding_box if include_boxes else None
                    data.append((box, text))
            return data
    except RuntimeError:
        print('Please specify correct path to tessdata.')

英文:

This is function, which uses tesseract-ocr for layout detection. You can try with different RIL levels and PSM. For more details have a look here: https://github.com/sirfz/tesserocr

import os
import platform
from typing import List, Tuple
from tesserocr import PyTessBaseAPI, iterate_level, RIL
system = platform.system()
if system == &#39;Linux&#39;:
tessdata_folder_default = &#39;&#39;
elif system == &#39;Windows&#39;:
tessdata_folder_default = r&#39;C:\Program Files (x86)\Tesseract-OCR\tessdata&#39;
else:
raise NotImplementedError
# this tesseract specific env variable takes precedence for tessdata folder location selection
# especially important for windows, as we don&#39;t know if we&#39;re running 32 or 64bit tesseract
tessdata_folder = os.getenv(&#39;TESSDATA_PREFIX&#39;, tessdata_folder_default)
def get_layout_boxes(input_image,  # PIL image object
level: RIL,
include_text: bool,
include_boxes: bool,
language: str,
psm: int,
tessdata_path=&#39;&#39;) -&gt; List[Tuple]:
&quot;&quot;&quot;
Get image components coordinates. It will return also text if include_text is True.
:param input_image: input PIL image
:param level: page iterator level, please see &quot;RIL&quot; enum
:param include_text: if True return boxes texts
:param include_boxes: if True return boxes coordinates
:param language: language for OCR
:param psm: page segmentation mode, by default it is PSM.AUTO which is 3
:param tessdata_path: the path to the tessdata folder
:return: list of tuples: [((x1, y1, x2, y2), text)), ...]
&quot;&quot;&quot;
assert any((include_text, include_boxes)), (
&#39;Both include_text and include_boxes can not be False.&#39;)
if not tessdata_path:
tessdata_path = tessdata_folder
try:
with PyTessBaseAPI(path=tessdata_path, lang=language) as api:
api.SetImage(input_image)
api.SetPageSegMode(psm)
api.Recognize()
page_iterator = api.GetIterator()
data = []
for pi in iterate_level(page_iterator, level):
bounding_box = pi.BoundingBox(level)
if bounding_box is not None:
text = pi.GetUTF8Text(level) if include_text else None
box = bounding_box if include_boxes else None
data.append((box, text))
return data
except RuntimeError:
print(&#39;Please specify correct path to tessdata.&#39;)

huangapple
  • 本文由 发表于 2020年1月3日 22:28:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/59580304.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定