英文:
Packages not installed during Docker build
问题
我正在尝试在基于python:3.10
镜像的Docker容器中安装tesseract-ocr
。在构建过程中,安装似乎进行得很顺利,但然后我无法在容器内找到文件。如果我随后打开容器并在容器内安装它,它就可以正常工作。
我的Dockerfile的相关部分如下:
# 基于debian的镜像
FROM python:3.10
WORKDIR /code
RUN mkdir __logger
RUN apt-get update -y
RUN apt-get install apt-utils -y
# tesseract部分,尝试了apt和apt-get
RUN apt-get install tesseract-ocr -y
COPY ./requirements.txt ./
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "./app.py"]
然后我使用docker compose up
运行容器,进入容器并使用docker exec -t -i my_container_name /bin/bash
命令,最后尝试find / -type d -name "tesseract*"
,但没有结果。
如果我运行apt-cache search tesseract-ocr
,我可以看到它在列表中可用。
然后,如果我在容器终端内运行apt install tesseract-ocr
,我可以看到文件已安装。然后,如果我再次运行find / -type d -name "tesseract*"
,我可以看到tesseract现在已安装。
如何使它在构建阶段正确安装?
这是构建过程末尾的RUN apt-get install tesseract-ocr -y
的日志片段:
#18 4.079 Preparing to unpack .../5-tesseract-ocr-osd_1%3a4.00~git30-7274cfa-1.1_all.deb ...
#18 4.086 Unpacking tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
#18 4.447 Selecting previously unselected package tesseract-ocr.
#18 4.451 Preparing to unpack .../6-tesseract-ocr_4.1.1-2.1_amd64.deb ...
#18 4.463 Unpacking tesseract-ocr (4.1.1-2.1) ...
#18 4.552 Setting up libarchive13:amd64 (3.4.3-2+deb11u1) ...
#18 4.574 Setting up tesseract-ocr-eng (1:4.00~git30-7274cfa-1.1) ...
#18 4.596 Setting up libgif7:amd64 (5.1.9-2) ...
#18 4.618 Setting up tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
#18 4.640 Setting up liblept5:amd64 (1.79.0-1.1+deb11u1) ...
#18 4.665 Setting up libtesseract4:amd64 (4.1.1-2.1) ...
#18 4.688 Setting up tesseract-ocr (4.1.1-2.1) ...
#18 4.710 Processing triggers for libc-bin (2.31-13+deb11u6) ...
#18 DONE 4.8s
英文:
I'm trying to install tesseract-ocr
in a Docker container based on the python:3.10
image. During the build process it looks like installation goes fine, but then I cannot find the files inside the container. If I then open up the container and install it from within the container it works.
Relevant parts of my Dockerfile looks like this
# debian based
FROM python:3.10
WORKDIR /code
RUN mkdir __logger
RUN apt-get update -y
RUN apt-get install apt-utils -y
# tesseract part, tried both apt & apt-get
RUN apt-get install tesseract-ocr -y
COPY ./requirements.txt ./
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "./app.py"]
Then I run the container with docker compose up
and go into the container with docker exec -t -i my_container_name /bin/bash
and finally try find / -type d -name "*tesseract*"
which yields no results.
If I run apt-cache search tesseract-ocr
I can see it is available in the list.
If I then run apt install tesseract-ocr
inside the container terminal, I can see the files are installed. And then if I run find / -type d -name "*tesseract*"
again, I can see that now tesseract was installed
root@06d4e841c6d2:/code# find / -type d -name "*tess*"
/usr/share/doc/tesseract-ocr-eng
/usr/share/doc/tesseract-ocr-osd
/usr/share/doc/tesseract-ocr
/usr/share/doc/libtesseract4
/usr/share/tesseract-ocr
/usr/share/tesseract-ocr/4.00/tessdata
/usr/share/tesseract-ocr/4.00/tessdata/tessconfigs
How can I make it work so that it is installed correctly during the build phase?
Here's a snippet of the logs towards the end of the build process for RUN apt-get install tesseract-ocr -y
#18 4.079 Preparing to unpack .../5-tesseract-ocr-osd_1%3a4.00~git30-7274cfa-1.1_all.deb ...
#18 4.086 Unpacking tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
#18 4.447 Selecting previously unselected package tesseract-ocr.
#18 4.451 Preparing to unpack .../6-tesseract-ocr_4.1.1-2.1_amd64.deb ...
#18 4.463 Unpacking tesseract-ocr (4.1.1-2.1) ...
#18 4.552 Setting up libarchive13:amd64 (3.4.3-2+deb11u1) ...
#18 4.574 Setting up tesseract-ocr-eng (1:4.00~git30-7274cfa-1.1) ...
#18 4.596 Setting up libgif7:amd64 (5.1.9-2) ...
#18 4.618 Setting up tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
#18 4.640 Setting up liblept5:amd64 (1.79.0-1.1+deb11u1) ...
#18 4.665 Setting up libtesseract4:amd64 (4.1.1-2.1) ...
#18 4.688 Setting up tesseract-ocr (4.1.1-2.1) ...
#18 4.710 Processing triggers for libc-bin (2.31-13+deb11u6) ...
#18 DONE 4.8s
答案1
得分: 0
无法复现您的问题。我使用以下截断的Dockerfile创建了一个Docker镜像:
# 基于Debian
FROM python:3.10
WORKDIR /code
RUN mkdir __logger
RUN apt-get update -y
RUN apt-get install apt-utils -y
# Tesseract部分,尝试使用apt和apt-get
RUN apt-get install tesseract-ocr -y
然后像这样构建了Docker镜像:docker build --tag stackoverflow:test .
然后登录到容器中,可以像这样找到Tesseract:
% docker run -it stackoverflow:test /bin/bash
root@2e2e3599c939:/code# find / -type d -name "tess*"
/usr/share/doc/tesseract-ocr
/usr/share/doc/libtesseract4
/usr/share/doc/tesseract-ocr-osd
/usr/share/doc/tesseract-ocr-eng
/usr/share/tesseract-ocr
/usr/share/tesseract-ocr/4.00/tessdata
/usr/share/tesseract-ocr/4.00/tessdata/tessconfigs
所以这个问题有点棘手。但是这里有一些可能有助于解决问题的尝试:
1)尝试单独构建Docker容器,而不使用Docker Compose。
2)在构建时尝试使用--no-cache
参数来移除缓存。
3)确保您正在运行最新版本的Docker。
英文:
I'm unable to reproduce your problem. I created a docker image with this truncated Dockerfile
# debian based
FROM python:3.10
WORKDIR /code
RUN mkdir __logger
RUN apt-get update -y
RUN apt-get install apt-utils -y
# tesseract part, tried both apt & apt-get
RUN apt-get install tesseract-ocr -y
and then built the docker image like docker build --tag stackoverflow:test .
and then logged into a container and was able to find tesseract like
% docker run -it stackoverflow:test /bin/bash
root@2e2e3599c939:/code# find / -type d -name "*tess*"
/usr/share/doc/tesseract-ocr
/usr/share/doc/libtesseract4
/usr/share/doc/tesseract-ocr-osd
/usr/share/doc/tesseract-ocr-eng
/usr/share/tesseract-ocr
/usr/share/tesseract-ocr/4.00/tessdata
/usr/share/tesseract-ocr/4.00/tessdata/tessconfigs
So this problem is a bit of stumper. But here are a few things that you can try that might help...
- try to build the docker container by itself, not using docker compose
- when building, to try to remove caching with
--no-cache
argument to docker - make sure that you are running the newest version of Docker
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论