pdf2image在Docker容器中失败。

huangapple go评论103阅读模式
英文:

pdf2image fails in docker container

问题

  1. 我在一个运行在Docker容器中的Python项目中遇到了问题,无法使`convert_from_path`正常工作(来自`pdf2image`库)。在我的Windows PC上本地运行正常,但在基于LinuxDocker容器中却不行。每次都会出现错误,内容是`Unable to get page count. Is poppler installed and in PATH?`。我的代码相关部分如下:
英文:

I have a Python project running in a docker container, but I can't get convert_from_path to work (from pdf2image library). It works locally on my Windows PC, but not in the linux-based docker container.

The error I get each time is Unable to get page count. Is poppler installed and in PATH?

Relevant parts of my code look like this

  1. from pdf2image import convert_from_path
  2. import os
  3. from sys import exit
  4. def my_function(file_source_path):
  5. try:
  6. pages = convert_from_path(file_source_path, 600, poppler_path=os.environ.get('POPPLER_PATH'))
  7. except Exception as e:
  8. print('Fail 1')
  9. print(e)
  10. try:
  11. pages = convert_from_path(file_source_path, 600)
  12. except Exception as e:
  13. print('Fail 2')
  14. print(e)
  15. try:
  16. pages = convert_from_path(file_source_path, 600, poppler_path=r'\usr\local\bin')
  17. except Exception as e:
  18. print('Fail 3')
  19. print(e)
  20. print(os.environ)
  21. exit('Exiting script')

In attempt 1 I try to reference the original file saved on windows. Basically the path refers to '/code/poppler' which is a binded mount referring to

  1. [snippet from docker-compose.yml]
  2. - type: bind
  3. source: "C:/Program Files/poppler-0.68.0/bin"
  4. target: /code/poppler

In attempt 2 I just try to leave the path empty. In attempt 3 I tried something I found that worked from some other users locally.

Relevant parts of my Dockerfile look like this

  1. FROM python:3.10
  2. WORKDIR /code
  3. # install poppler
  4. RUN apt-get update
  5. RUN apt-get install poppler-utils -y
  6. COPY ./requirements.txt ./
  7. RUN pip install --upgrade pip
  8. RUN pip install --no-cache-dir -r requirements.txt
  9. COPY . .
  10. CMD ["python", "./app.py"]

答案1

得分: 0

以下是翻译的部分:

问题是我的Docker镜像没有正确地刷新,之后清除了构建缓存,再尝试使用上述的Dockerfile中间选项与工作。

因此,在Dockerfile中的RUN apt-get install poppler-utils -y与代码中不引用路径 pages = convert_from_path(file_source_path, 600) 的组合将有效,因为在安装 poppler-utils 时会自动找到PATH

还可以从docker-compose.yml.env文件中删除绑定的挂载。

英文:

So the issue was that my Docker image was not refreshing correctly and after nuking the build-cache and trying again the middle option worked combined with the above Dockerfile.

So a combination of RUN apt-get install poppler-utils -y in the Dockerfile + not referencing the path in the code pages = convert_from_path(file_source_path, 600) will work, as it will find the PATH automatically when installing poppler-utils.

The binded mount can also be removed from docker-compose.yml and from the .env file.

huangapple
  • 本文由 发表于 2023年4月11日 01:44:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979398.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定