确定使用Python单独识别表格单元格。

huangapple go评论51阅读模式
英文:

Identify Table Cells Individually (Separately) using Python

问题

I understand your instructions. Here is the translated content:

我有一张表格,上面有明显的垂直和水平网格线(网格线有时是黑色,有时是白色,可以事先知道)。

我正在尝试找到一种方法,以便逐个定位表格照片中的每个单元格,每个单元格具有不同的属性(文本、颜色、数字、链接等),我希望允许用户在提交之前对每个单元格进行一些分析。

当我向用户展示给定的单元格时,我还会向他展示该行的第一个单元格和该列的标题单元格。

我在过去的2-3小时里在互联网上搜索了很多,但一无所获,我的代码还没有进展,所以没有贴上的必要。

我尝试过的一些链接:

  1. 使用Python生成带有图像的单元格的表格
  2. 从表格图像中分割单元格
  3. 使用OpenCV识别表格中的单元格,并在PyQt5的QGraphicsView中显示识别结果
  4. 如何使用Pandas获取表格中的单个单元格?
  5. 使用opencv检测表格中的列
  6. 在Python中从整个图像中检测表格部分

其中大多数通常仅适用于从图像中提取文本数据,但我不想提取任何数据,我只是希望起始阶段能够检测到表格中的所有单元格,并向用户显示每个单元格(例如在循环中,向用户显示每个单元格【原样,无修改】与相应的标题和行索引,意味着每次迭代都会向用户显示3个东西:{1} 单元格 {2} 标题 {3} 行索引)

示例图片(实际数据已经分类,所以我找到了一个Google图片来展示我寻找的原则):

确定使用Python单独识别表格单元格。

图片链接

我知道我没有贴任何代码,这是因为我尝试的所有方法都没有取得任何进展,我真的不知道该怎么办…

如果你认为我在问题中可以做得更好以便改进,请告诉我。

英文:

I have a table, a clear table with vertical and horizontal grid lines (grid lines are sometimes black and sometimes white, it is possible to know this in advance).

I'm trying to find a way to locate each cell on the table photo individually, each cell got different properties (text, color, number, link, etc...) and I want to allow the user to perform some analysis on each cell before submitting.

When I'll show the user a given cell I will also show him the first cell from that row and the header cell for that column.

I've been searching the internet for the past 2-3 hours and found nothing, my code got me nowhere yet so there is no point in pasting it.

Some links I've tried:

  1. Generating tables with images in cells using Python
  2. Split cells from an image of a table
  3. Recognize cells from a table with Open CV and display the recognition result in QGraphicsView of PyQt5
  4. How to get individual cell in table using Pandas?
  5. Detect columns from a table with opencv
  6. detect a table part from entire image in python

Most of them usually work for only extracting textual data from an image, but I do not want to extract any data, I simply want as a start to detect all of the cells in the table and display the user each of the cells (for example in a loop, show him each cell [as is, without modification] with the corresponding header and row index, meaning each iteration will show the user 3 things: {1} the cell {2} the header {3} the row index)

Example Image (Actual data is classified so I found a Google image to show the principle Im looking for):

确定使用Python单独识别表格单元格。

Image Link

> I know I didn't paste any code, it's because none of the tries I did worked even a little bit, I really have no idea what to do...
>
> If you have anything you think I can do better in order to improve my question, please tell me

答案1

得分: 2

以下是代码的翻译部分:

img = cv.imread("your_img.png", cv.IMREAD_GRAYSCALE)
## 检测图像中的边缘。将获取表格单元格和文本,文本将在稍后删除
edges = cv.Canny(img, 10, 20)
## 使边缘稍微变粗
edges = cv.dilate(edges, np.ones((3,3)))

## 反转图像
_, edges = cv.threshold(edges, 127, 255, cv.THRESH_BINARY_INV)
## 查找单元格的轮廓
conts, _  = cv.findContours(edges, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
## 添加一个凸包以去除文本
conts = [cv.convexHull(cont) for cont in conts]
## 过滤噪音
conts = [cont for cont in conts if cv.contourArea(cont) > 100]
## 在图像上绘制轮廓
edges = cv.drawContours(edges*0, conts, -1, (1,), 1)

确定使用Python单独识别表格单元格。 是最终图像

英文:

(I've edited my answer to account for the fact that you can't remove the background colors)

The following code gives you the cells:

img = cv.imread("your_img.png",cv.IMREAD_GRAYSCALE)
## detect edges in the image. Will get the table cells and the text
## the text will be removed later
edges = cv.Canny(img,10,20)
## make the edges a bit thicker
edges = cv.dilate(edges, np.ones((3,3))) 

## invert image
_, edges = cv.threshold(edges,127,255,cv.THRESH_BINARY_INV)
## find the contours of the cells
conts,_  = cv.findContours(edges, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
## add a convex hull around to remove the text
conts = [cv.convexHull(cont) for cont in conts]
## filter out noise
conts = [cont for cont in conts if cv.contourArea(cont) > 100]
## draw on an image
edges = cv.drawContours(edges*0, conts, -1,(1,),1)

确定使用Python单独识别表格单元格。's the final image

huangapple
  • 本文由 发表于 2023年4月17日 16:16:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033027.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定