Error loading base64 image: PIL.UnidentifiedImageError: 无法识别图像文件 <_io.BytesIO

huangapple go评论63阅读模式
英文:

Error loading base64 image: PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO

问题

import base64
import io
from PIL import Image
import pytesseract
import sys

base64_string = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh....."

img_data = base64.b64decode(base64_string)

img = Image.open(io.BytesIO(img_data))  # <== ERROR LINE

text = pytesseract.image_to_string(img, config='--psm 6')

print(text)

该代码包含一个字符串base64图像,需要将其转换为图像,以便使用pytesseract进行分析。在以下代码行出现错误:

img = Image.open(io.BytesIO(img_data))  # <== ERROR LINE

给出的错误消息为:

Traceback (most recent call last):
  File "D:\aa\xampp\htdocs\xbanca\aa.py", line 14, in <module>
    img = Image.open(io.BytesIO(img_data))
  File "D:\python3.10.10\lib\site-packages\PIL\Image.py", line 3283, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x000001A076F673D0>

我尝试使用numpy和request库,但结果都相同,而且base64示例图像在其他转换器中正常工作。

英文:

I have a string base64 image that need to convert so then I can read it as image to analyze with pytesseract:

import base64
import io
from PIL import Image
import pytesseract
import sys


base64_string = &quot;data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh.....&quot;

img_data = base64.b64decode(base64_string)

img = Image.open(io.BytesIO(img_data)) # &lt;== ERROR LINE

text = pytesseract.image_to_string(img, config=&#39;--psm 6&#39;)

print(text)

gives the error:

Traceback (most recent call last):
  File &quot;D:\aa\xampp\htdocs\xbanca\aa.py&quot;, line 14, in &lt;module&gt;
    img = Image.open(io.BytesIO(img_data))
  File &quot;D:\python3.10.10\lib\site-packages\PIL\Image.py&quot;, line 3283, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file &lt;_io.BytesIO object at 0x000001A076F673D0&gt;

I tried using numpy and request libraries but all have the same result.. and the base64 example image is working ok in any another converter.

答案1

得分: 2

这是一个非常常见的误解。
字符串

base64_string = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh.....";

不是一个Base64字符串,而是一个DataURL

以data: scheme为前缀的URL允许内容创建者在文档中嵌入小文件。

其中包含一个Base64字符串。
Base64字符串直接在'base64,'之后开始。因此,您需要去掉'data:image/jpeg;base64,'部分。

例如:

b64 = base64_string.split(",")[1]

之后,您可以解码数据:

img_data = base64.b64decode(b64)

我修改了问题中的代码,并使用了在https://www.base64encode.org/上对以下小型JPEG图像进行Base64编码的base64:
Error loading base64 image: PIL.UnidentifiedImageError: 无法识别图像文件 <_io.BytesIO

并获得了预期的文本输出:

1个回答

英文:

That's a very common misunderstanding.
The string

base64_string = &quot;data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEPERETFh.....&quot;

is not a Base64 string, but a DataURL
>URLs prefixed with the data: scheme, allow content creators to embed small files inline in documents

that contains a Base64 string.
The Base64 string starts directly after 'base64,'. Therefore you need to cut off the 'data:image/jpeg;base64,' part.

e.g.:

b64 = base64_string.split(&quot;,&quot;)[1]

after that you can decode the data:

img_data = base64.b64decode(b64)

I modified the code from the question and used the base64 of the following small JPEG image which I base64 encoded on https://www.base64encode.org/:
Error loading base64 image: PIL.UnidentifiedImageError: 无法识别图像文件 <_io.BytesIO

and got the expected text output:
> 1 Answer

huangapple
  • 本文由 发表于 2023年2月19日 13:36:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75498214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定