英文:
Error loading base64 image: PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO
问题
import base64
import io
from PIL import Image
import pytesseract
import sys
base64_string = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh....."
img_data = base64.b64decode(base64_string)
img = Image.open(io.BytesIO(img_data)) # <== ERROR LINE
text = pytesseract.image_to_string(img, config='--psm 6')
print(text)
该代码包含一个字符串base64图像,需要将其转换为图像,以便使用pytesseract进行分析。在以下代码行出现错误:
img = Image.open(io.BytesIO(img_data)) # <== ERROR LINE
给出的错误消息为:
Traceback (most recent call last):
File "D:\aa\xampp\htdocs\xbanca\aa.py", line 14, in <module>
img = Image.open(io.BytesIO(img_data))
File "D:\python3.10.10\lib\site-packages\PIL\Image.py", line 3283, in open
raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x000001A076F673D0>
我尝试使用numpy和request库,但结果都相同,而且base64示例图像在其他转换器中正常工作。
英文:
I have a string base64 image that need to convert so then I can read it as image to analyze with pytesseract:
import base64
import io
from PIL import Image
import pytesseract
import sys
base64_string = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh....."
img_data = base64.b64decode(base64_string)
img = Image.open(io.BytesIO(img_data)) # <== ERROR LINE
text = pytesseract.image_to_string(img, config='--psm 6')
print(text)
gives the error:
Traceback (most recent call last):
File "D:\aa\xampp\htdocs\xbanca\aa.py", line 14, in <module>
img = Image.open(io.BytesIO(img_data))
File "D:\python3.10.10\lib\site-packages\PIL\Image.py", line 3283, in open
raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x000001A076F673D0>
I tried using numpy and request libraries but all have the same result.. and the base64 example image is working ok in any another converter.
答案1
得分: 2
这是一个非常常见的误解。
字符串
base64_string = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh.....";
不是一个Base64字符串,而是一个DataURL
以data: scheme为前缀的URL允许内容创建者在文档中嵌入小文件。
其中包含一个Base64字符串。
Base64字符串直接在'base64,'之后开始。因此,您需要去掉'data:image/jpeg;base64,'部分。
例如:
b64 = base64_string.split(",")[1]
之后,您可以解码数据:
img_data = base64.b64decode(b64)
我修改了问题中的代码,并使用了在https://www.base64encode.org/上对以下小型JPEG图像进行Base64编码的base64:
并获得了预期的文本输出:
1个回答
英文:
That's a very common misunderstanding.
The string
base64_string = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh....."
is not a Base64 string, but a DataURL
>URLs prefixed with the data: scheme, allow content creators to embed small files inline in documents
that contains a Base64 string.
The Base64 string starts directly after 'base64,'. Therefore you need to cut off the 'data:image/jpeg;base64,' part.
e.g.:
b64 = base64_string.split(",")[1]
after that you can decode the data:
img_data = base64.b64decode(b64)
I modified the code from the question and used the base64 of the following small JPEG image which I base64 encoded on https://www.base64encode.org/:
and got the expected text output:
> 1 Answer
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论