英文:
Paddle OCR Issue when passing pdf file for text detection
问题
我遇到了一个问题,当传递PDF文件给PaddleOCR时。
我的代码是:
!paddleocr --image_dir /content/SER-1678793239.pdf --use_angle_cls true --use_gpu false
我遇到的问题是:
AttributeError: 'Document'对象没有'pageCount'属性
尽管对于图像文件它运行正常。
我尝试了不同的方法,改变PDF文件名称等,以及页面数量,但都没有成功。
英文:
Hi i am facing issue when passing pdf file to paddleocr
My code is:
!paddleocr --image_dir /content/SER-1678793239.pdf --use_angle_cls true --use_gpu false
Issue i am facing is:
AttributeError: 'Document' object has no attribute 'pageCount'
Although it works fine for the image files
I Tried different things changing pdf file name etc and number of pages nothing worked
答案1
得分: 0
你可以直接在C:\Python3.10.0\Lib\site-packages\paddleocr\ppocr\utils\utility.py中进行编辑。
从第93行开始:
with fitz.open(img_path) as pdf:
for pg in range(0, pdf.page_count):
page = pdf[pg]
mat = fitz.Matrix(2, 2)
pm = page.get_pixmap(matrix=mat, alpha=False)
# 如果宽度或高度大于2000像素,不要放大图像
if pm.width > 2000 or pm.height > 2000:
pm = page.get_pixmap(matrix=fitz.Matrix(1, 1), alpha=False)
img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
imgs.append(img)
return imgs, False, True
我将camelCases更改为snake_case如下:
- pageCount -> page_count
- getPixmap -> get_pixmap
你还可以参考此链接:https://github.com/PaddlePaddle/PaddleOCR/discussions/8972
英文:
You can edit directly in C:\Python3.10.0\Lib\site-packages\paddleocr\ppocr\utils\utility.py
From line 93:
with fitz.open(img_path) as pdf:
for pg in range(0, pdf.page_count):
page = pdf[pg]
mat = fitz.Matrix(2, 2)
pm = page.get_pixmap(matrix=mat, alpha=False)
# if width or height > 2000 pixels, don't enlarge the image
if pm.width > 2000 or pm.height > 2000:
pm = page.get_pixmap(matrix=fitz.Matrix(1, 1), alpha=False)
img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
imgs.append(img)
return imgs, False, True
I changed camelCases to snake_case mentioned below:
pageCount -> page_count ,
getPixmap -> get_pixmap
You can also refer to this link : https://github.com/PaddlePaddle/PaddleOCR/discussions/8972
答案2
得分: 0
I Solved the issue by uninstalling the pymupdf
library (previously installed with paddleocr automatically) the below command
!pip uninstall pymupdf
Then installed specific version of pymupdf==1.19.0
and issue resolved successfully
!pip install --ignore-installed pymupdf==1.19.0
Now it's working fine!
Note: !
sign in front of commands tells the notebook it's a command (not a simple code) so if you are running code outside of the notebook you need to remove !
from the base.
英文:
I Solved the issue by uninstalling the pymupdf
library (previously installed with paddleocr automatically) the below command
!pip uinstall pymupdf
Then installed specific version of pymupdf==1.19.0
and issue resolved successfully
!pip install --ignore-installed pymupdf==1.19.0
Now it's working fine!
Note: !
sign in front of commands tells the notebook it's a command (not a simple code) so if you are running code outside of the notebook you need to remove !
from the base.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论