pdf2image在Docker容器中失败。

huangapple go评论68阅读模式
英文:

pdf2image fails in docker container

问题

我在一个运行在Docker容器中的Python项目中遇到了问题,无法使`convert_from_path`正常工作(来自`pdf2image`库)。在我的Windows PC上本地运行正常,但在基于Linux的Docker容器中却不行。每次都会出现错误,内容是`Unable to get page count. Is poppler installed and in PATH?`。我的代码相关部分如下:
英文:

I have a Python project running in a docker container, but I can't get convert_from_path to work (from pdf2image library). It works locally on my Windows PC, but not in the linux-based docker container.

The error I get each time is Unable to get page count. Is poppler installed and in PATH?

Relevant parts of my code look like this

from pdf2image import convert_from_path
import os
from sys import exit

def my_function(file_source_path):
    try:
        pages = convert_from_path(file_source_path, 600, poppler_path=os.environ.get('POPPLER_PATH'))
    except Exception as e:
        print('Fail 1')
        print(e)
    try:
        pages = convert_from_path(file_source_path, 600)
    except Exception as e:
        print('Fail 2')
        print(e)
    try:
        pages = convert_from_path(file_source_path, 600, poppler_path=r'\usr\local\bin')
    except Exception as e:
        print('Fail 3')
        print(e)
        print(os.environ)
        exit('Exiting script')

In attempt 1 I try to reference the original file saved on windows. Basically the path refers to '/code/poppler' which is a binded mount referring to

[snippet from docker-compose.yml]
- type: bind
  source: "C:/Program Files/poppler-0.68.0/bin"
  target: /code/poppler

In attempt 2 I just try to leave the path empty. In attempt 3 I tried something I found that worked from some other users locally.

Relevant parts of my Dockerfile look like this

FROM python:3.10

WORKDIR /code

# install poppler
RUN apt-get update
RUN apt-get install poppler-utils -y

COPY ./requirements.txt ./
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "./app.py"]

答案1

得分: 0

以下是翻译的部分:

问题是我的Docker镜像没有正确地刷新,之后清除了构建缓存,再尝试使用上述的Dockerfile中间选项与工作。

因此,在Dockerfile中的RUN apt-get install poppler-utils -y与代码中不引用路径 pages = convert_from_path(file_source_path, 600) 的组合将有效,因为在安装 poppler-utils 时会自动找到PATH

还可以从docker-compose.yml.env文件中删除绑定的挂载。

英文:

So the issue was that my Docker image was not refreshing correctly and after nuking the build-cache and trying again the middle option worked combined with the above Dockerfile.

So a combination of RUN apt-get install poppler-utils -y in the Dockerfile + not referencing the path in the code pages = convert_from_path(file_source_path, 600) will work, as it will find the PATH automatically when installing poppler-utils.

The binded mount can also be removed from docker-compose.yml and from the .env file.

huangapple
  • 本文由 发表于 2023年4月11日 01:44:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979398.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定