英文:
Identify Table Cells Individually (Separately) using Python
问题
I understand your instructions. Here is the translated content:
我有一张表格,上面有明显的垂直和水平网格线(网格线有时是黑色,有时是白色,可以事先知道)。
我正在尝试找到一种方法,以便逐个定位表格照片中的每个单元格,每个单元格具有不同的属性(文本、颜色、数字、链接等),我希望允许用户在提交之前对每个单元格进行一些分析。
当我向用户展示给定的单元格时,我还会向他展示该行的第一个单元格和该列的标题单元格。
我在过去的2-3小时里在互联网上搜索了很多,但一无所获,我的代码还没有进展,所以没有贴上的必要。
我尝试过的一些链接:
- 使用Python生成带有图像的单元格的表格
- 从表格图像中分割单元格
- 使用OpenCV识别表格中的单元格,并在PyQt5的QGraphicsView中显示识别结果
- 如何使用Pandas获取表格中的单个单元格?
- 使用opencv检测表格中的列
- 在Python中从整个图像中检测表格部分
其中大多数通常仅适用于从图像中提取文本数据,但我不想提取任何数据,我只是希望起始阶段能够检测到表格中的所有单元格,并向用户显示每个单元格(例如在循环中,向用户显示每个单元格【原样,无修改】与相应的标题和行索引,意味着每次迭代都会向用户显示3个东西:{1} 单元格 {2} 标题 {3} 行索引)
示例图片(实际数据已经分类,所以我找到了一个Google图片来展示我寻找的原则):
我知道我没有贴任何代码,这是因为我尝试的所有方法都没有取得任何进展,我真的不知道该怎么办…
如果你认为我在问题中可以做得更好以便改进,请告诉我。
英文:
I have a table, a clear table with vertical and horizontal grid lines (grid lines are sometimes black and sometimes white, it is possible to know this in advance).
I'm trying to find a way to locate each cell on the table photo individually, each cell got different properties (text, color, number, link, etc...) and I want to allow the user to perform some analysis on each cell before submitting.
When I'll show the user a given cell I will also show him the first cell from that row and the header cell for that column.
I've been searching the internet for the past 2-3 hours and found nothing, my code got me nowhere yet so there is no point in pasting it.
Some links I've tried:
- Generating tables with images in cells using Python
- Split cells from an image of a table
- Recognize cells from a table with Open CV and display the recognition result in QGraphicsView of PyQt5
- How to get individual cell in table using Pandas?
- Detect columns from a table with opencv
- detect a table part from entire image in python
Most of them usually work for only extracting textual data from an image, but I do not want to extract any data, I simply want as a start to detect all of the cells in the table and display the user each of the cells (for example in a loop, show him each cell [as is, without modification] with the corresponding header and row index, meaning each iteration will show the user 3 things: {1} the cell {2} the header {3} the row index)
Example Image (Actual data is classified so I found a Google image to show the principle Im looking for):
> I know I didn't paste any code, it's because none of the tries I did worked even a little bit, I really have no idea what to do...
>
> If you have anything you think I can do better in order to improve my question, please tell me
答案1
得分: 2
以下是代码的翻译部分:
img = cv.imread("your_img.png", cv.IMREAD_GRAYSCALE)
## 检测图像中的边缘。将获取表格单元格和文本,文本将在稍后删除
edges = cv.Canny(img, 10, 20)
## 使边缘稍微变粗
edges = cv.dilate(edges, np.ones((3,3)))
## 反转图像
_, edges = cv.threshold(edges, 127, 255, cv.THRESH_BINARY_INV)
## 查找单元格的轮廓
conts, _ = cv.findContours(edges, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
## 添加一个凸包以去除文本
conts = [cv.convexHull(cont) for cont in conts]
## 过滤噪音
conts = [cont for cont in conts if cv.contourArea(cont) > 100]
## 在图像上绘制轮廓
edges = cv.drawContours(edges*0, conts, -1, (1,), 1)
英文:
(I've edited my answer to account for the fact that you can't remove the background colors)
The following code gives you the cells:
img = cv.imread("your_img.png",cv.IMREAD_GRAYSCALE)
## detect edges in the image. Will get the table cells and the text
## the text will be removed later
edges = cv.Canny(img,10,20)
## make the edges a bit thicker
edges = cv.dilate(edges, np.ones((3,3)))
## invert image
_, edges = cv.threshold(edges,127,255,cv.THRESH_BINARY_INV)
## find the contours of the cells
conts,_ = cv.findContours(edges, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
## add a convex hull around to remove the text
conts = [cv.convexHull(cont) for cont in conts]
## filter out noise
conts = [cont for cont in conts if cv.contourArea(cont) > 100]
## draw on an image
edges = cv.drawContours(edges*0, conts, -1,(1,),1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论