英文:
Docker multistage build - why not caching layer?
问题
我正在尝试加速我的Docker多阶段构建。
在第一阶段中,我复制 .
(除了.dockerignore
中的内容)并进行构建。我故意使用了 COPY . .
,这样我就不必列出感兴趣的文件夹,也避免了COPY
的平铺效果。本帖末尾包含了一个Dockerfile
代码片段。
我以为 RUN rm -rf nginx/
意味着在开发过程中,如果我更改了 nginx/
中的文件并重新构建镜像,第一阶段将被跳过。但这并没有起作用,它仍然在重新构建,而这很慢。我是否误解了缓存的工作原理?基本上,我正在尝试对 nginx/
进行更改,但重新构建所有内容以尝试新内容速度很慢。是否有一种方法可以避免在我更改了 nginx/
中的内容时重新构建第一层?
FROM myimage/node:14.19.3 as build
WORKDIR /home/node
COPY . .
# 在开发中,不应重新构建NGINX的更改;
RUN rm -rf nginx/
RUN npm run build
FROM myimage/nginx:1.23.3
COPY ./nginx/nginx.conf /etc/nginx/nginx.conf
// SNIP
英文:
I am trying to speed up my docker multistage build.
In the first stage I copy .
(except stuff in .dockerignore
) and build. I deliberately use COPY . .
so I don't have to list theinteresting folders & to avoid COPY flattening. A Dockerfile
snippet is at end of this post.
I thought the RUN rm -rf nginx/
would mean in development if I changed a file in nginx/
and I rebuild the image that the first stage would be skipped. But that is not working and it is rebuilding it which is slow. Am I misunderstanding how the caching works? Basically I am trying to make changes to the nginx/
and it is slow to rebuild everything to try stuff. Is there a way to avoid rebuilding the first layer if I change something in nginx/
?
FROM myimage/node:14.19.3 as build
WORKDIR /home/node
COPY . .
# During development NGINX changes should not rebuild this layer;
RUN rm -rf nginx/
RUN npm run build
FROM myimage/nginx:1.23.3
COPY ./nginx/nginx.conf /etc/nginx/nginx.conf
// SNIP
答案1
得分: 1
通常来说,Dockerfile 中的每个指令大致对应最终镜像中的一个层。每个层都会向底层添加更多内容,最终所有层组成一个堆栈。
每当一个层发生变化,那个层都需要重新构建,这会影响到在它之后的所有层。
在你的情况下,阻止你缓存初始层的问题是 COPY . .
指令。实际上,从缓存的角度来看,该指令非常低效,因为更新任何文件都会导致每次构建 Docker 镜像时重新安装所有文件和依赖项,即使它们没有发生变化。
如果你无法编辑该指令,尝试将其尽可能晚地执行,比如:
FROM myimage/node:14.19.3 as build
WORKDIR /home/node
COPY package.json package-lock.json ./
RUN npm run build
COPY . .
RUN rm -rf nginx/
英文:
Generally speaking, each instruction in a Dockerfile becomes roughly a layer in the final image. Every layer add more content to the bottom layer and, finally, all the layers compose a stack.
Whenever a layer changes, that layer need to rebuild and this affect all the layers that comes after it.
In your case, the issue that is preventing you to cache the initial layers is the COPY . .
instruction. In fact, from the caching point of view that instruction is very inefficient since updating any file causes a reinstall of all files and dependencies every time you build the Docker image even if they were not changed.
If you cannot edit that instruction try to move it in order to be executed as late as possible, such as:
FROM myimage/node:14.19.3 as build
WORKDIR /home/node
COPY package.json package-lock.json ./
RUN npm run build
COPY . .
RUN rm -rf nginx/
答案2
得分: 1
简而言之,通过在目录中使用COPY . .
复制所有文件和文件夹,当您修改该文件夹中的任何文件(甚至是不相关的文件)时,会使缓存失效。缓存将从编辑文件具有关联的图层开始重新创建。
在这种情况下,“相关性”是您对nginx.conf进行更改使层失效的事实,因为“.”的复制会复制所有文件,甚至是“.conf”文件,而派生层在校验和方面不能这样做,因此图层不能相同。
简短总结
为了更好地解释发生的情况,由于我不知道您的应用程序中有什么,我将使用我的一个类似的示例。
一个简单的React应用程序,所以,让我们开始吧。
在这个示例中,复制了特定目录:
FROM node as build-stage
WORKDIR /app
COPY package*.json /app/
RUN apt-get update && apt-get install -y vim
RUN npm install
# 注意下面的文件
# 我将这个放在所有其他文件之前
COPY ./uselessfile.txt ./veryuseless.txt
COPY ./node_modules ./node_modules
COPY ./public ./public
COPY ./src ./src
# 不是真的必要,因为我们启动npm run build
COPY ./build ./build
RUN npm run build
CMD ["npm", "start"]
当我运行docker build -t deusdog .
构建时,我将获得以下输出:
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 385B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/node:latest 0.5s
=> [internal] load build context 1.7s
=> => transferring context: 3.21MB 1.4s
=> [ 1/13] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> CACHED [ 2/13] WORKDIR /app 0.0s
=> CACHED [ 3/13] COPY package*.json /app/ 0.0s
=> [ 4/13] RUN apt-get update && apt-get install -y vim 5.0s
=> [ 5/13] RUN npm install 35.4s
=> [ 6/13] COPY ./uselessfile.txt ./veryuseless.txt 0.9s
=> [ 7/13] COPY ./node_modules ./node_modules 20.2s
=> [ 8/13] COPY ./public ./public 0.0s
=> [ 9/13] COPY ./src ./src 0.0s
=> [10/13] COPY ./build ./build 0.0s
=> [11/13] RUN npm run build 20.6s
=> exporting to image 56.1s
=> => exporting layers 56.0s
=> => writing image sha256:349b2709592f57b9b61295aa3ae1989e5179c42e006d8e72e5beee08ae432db7
并且,非常重要的是,如果我重新运行相同的命令而不修改任何内容,我将获得类似的输出:
=> [ 1/11] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> CACHED [ 2/11] WORKDIR /app 0.0s
=> CACHED [ 3/11] COPY package*.json /app/ 0.0s
=> CACHED [ 4/11] RUN apt-get update && apt-get install -y vim 0.0s
=> CACHED [ 5/11] RUN npm install 0.0s
=> CACHED [ 6/11] COPY ./uselessfile.txt ./veryuseless.txt 0.0s
=> CACHED [ 7/11] COPY ./node_modules ./node_modules 0.0s
=> CACHED [ 8/11] COPY ./public ./public 0.0s
=> CACHED [ 9/11] COPY ./src ./src 0.0s
=> CACHED [10/11] COPY ./build ./build 0.0s
=> CACHED [11/11] RUN npm run build 0.0s
=> exporting to image
您可以看到许多缓存的图层。
因此,如果您修改了uselessfile.txt
文件,然后重新运行docker build
命令,您将获得以下结果:
=> transferring context: 3.21MB 3.1s
=> [ 1/11] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> CACHED [ 2/11] WORKDIR /app 0.0s
=> CACHED [ 3/11] COPY package*.json /app/ 0.0s
=> CACHED [ 4/11] RUN apt-get update && apt-get install -y vim 0.0s
=> CACHED [ 5/11] RUN npm install 0.0s
=> [ 6/11] COPY ./uselessfile.txt ./veryuseless.txt 0.1s
=> [ 7/11] COPY ./node_modules ./node_modules 23.0s
=> [ 8/11] COPY ./public ./public 0.0s
=> [ 9/11] COPY ./src ./src 0.0s
=> [10/11]
<details>
<summary>英文:</summary>
## In short
The fact that you copy all the files and folders with `COPY . .` in your directory, invalidates the cache when you modify any file (even unrelated ones) in this folder. Cache will be recreated starting from the layer where your edited file has a relevance.
In this case `the relavance` is the fact that your changed on nginx.conf invalidates the layers, because the copy of `.` copies all the files, even the `.conf`, and the deriving layer, in terms of Checksum **cannot** so the layer cannot be the same.
[This answer explain how the checksum of layers are calculated][1]
## TL; DR;
To do a better explanation of what is happening, and due to the fact i don't know what is in your app, i will go with a similar example of mine.
A simple React app, so, let's get in.
In this example there is a copy of specific directories:
```yaml
FROM node as build-stage
WORKDIR /app
COPY package*.json /app/
RUN apt-get update && apt-get install -y vim
RUN npm install
# pay attention to the following file
# i put this BEFORE all other folders
COPY ./uselessfile.txt ./veryuseless.txt
COPY ./node_modules ./node_modules
COPY ./public ./public
COPY ./src ./src
# not really necessary because we launch the npm run build
COPY ./build ./build
RUN npm run build
CMD ["npm", "start"]
when i launch the build of docker build -t deusdog .
i will obtain this:
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 385B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/node:latest 0.5s
=> [internal] load build context 1.7s
=> => transferring context: 3.21MB 1.4s
=> [ 1/13] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> CACHED [ 2/13] WORKDIR /app 0.0s
=> CACHED [ 3/13] COPY package*.json /app/ 0.0s
=> [ 4/13] RUN apt-get update && apt-get install -y vim 5.0s
=> [ 5/13] RUN npm install 35.4s
=> [ 6/13] COPY ./uselessfile.txt ./veryuseless.txt 0.9s
=> [ 7/13] COPY ./node_modules ./node_modules 20.2s
=> [ 8/13] COPY ./public ./public 0.0s
=> [ 9/13] COPY ./src ./src 0.0s
=> [10/13] COPY ./build ./build 0.0s
=> [11/13] RUN npm run build 20.6s
=> exporting to image 56.1s
=> => exporting layers 56.0s
=> => writing image sha256:349b2709592f57b9b61295aa3ae1989e5179c42e006d8e72e5beee08ae432db7
and, very important, if i relaunch the same command without modifying anything i'll obtain something like:
=> [ 1/11] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> CACHED [ 2/11] WORKDIR /app 0.0s
=> CACHED [ 3/11] COPY package*.json /app/ 0.0s
=> CACHED [ 4/11] RUN apt-get update && apt-get install -y vim 0.0s
=> CACHED [ 5/11] RUN npm install 0.0s
=> CACHED [ 6/11] COPY ./uselessfile.txt ./veryuseless.txt 0.0s
=> CACHED [ 7/11] COPY ./node_modules ./node_modules 0.0s
=> CACHED [ 8/11] COPY ./public ./public 0.0s
=> CACHED [ 9/11] COPY ./src ./src 0.0s
=> CACHED [10/11] COPY ./build ./build 0.0s
=> CACHED [11/11] RUN npm run build 0.0s
=> exporting to image
you can see a lot of cached layers.
So if you do any modication of the file uselessfile.txt
, and then relaunch the docker build
command, you'll obtain:
=> transferring context: 3.21MB 3.1s
=> [ 1/11] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> CACHED [ 2/11] WORKDIR /app 0.0s
=> CACHED [ 3/11] COPY package*.json /app/ 0.0s
=> CACHED [ 4/11] RUN apt-get update && apt-get install -y vim 0.0s
=> CACHED [ 5/11] RUN npm install 0.0s
=> [ 6/11] COPY ./uselessfile.txt ./veryuseless.txt 0.1s
=> [ 7/11] COPY ./node_modules ./node_modules 23.0s
=> [ 8/11] COPY ./public ./public 0.0s
=> [ 9/11] COPY ./src ./src 0.0s
=> [10/11] COPY ./build ./build 0.0s
=> [11/11] RUN npm run build 10.7s
=> => # babel-preset-react-app is part of the create-react-app project, which
=> => # is not maintianed anymore. It is thus unlikely that this bug will
=> => # ever be fixed. Add "@babel/plugin-proposal-private-property-in-object" to
=> => # your devDependencies to work around this error
as you can see, all the layers after the copy of uselessfile.txt
will be recreated. Every. Single. Time.
Another (superfaster) example
In the following example, i'll put the line COPY ./uselessfile.txt ./veryuseless.txt
after a lot of layers, like this:
FROM node as build-stage
WORKDIR /app
COPY package*.json /app/
RUN apt-get update && apt-get install -y vim
RUN npm install
COPY ./node_modules ./node_modules
COPY ./public ./public
COPY ./src ./src
COPY ./build ./build
# finally i do a build
RUN npm run build
# here insteal i put this AFTER all other folders and operations
COPY ./uselessfile.txt ./veryuseless.txt
CMD ["npm", "start"]
Then i launch the docker build
command and i will obtain:
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 482B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/node:latest 1.2s
=> [auth] library/node:pull token for registry-1.docker.io 0.0s
=> [internal] load build context 6.0s
=> => transferring context: 3.21MB 5.4s
=> [ 1/11] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> CACHED [ 2/11] WORKDIR /app 0.0s
=> CACHED [ 3/11] COPY package*.json /app/ 0.0s
=> CACHED [ 4/11] RUN apt-get update && apt-get install -y vim 0.0s
=> CACHED [ 5/11] RUN npm install 0.0s
=> [ 6/11] COPY ./node_modules ./node_modules 22.5s
=> [ 7/11] COPY ./public ./public 0.2s
=> [ 8/11] COPY ./src ./src 0.2s
=> [ 9/11] COPY ./build ./build 0.2s
=> [10/11] RUN npm run build 30.3s
=> [11/11] COPY ./uselessfile.txt ./veryuseless.txt 0.0s
=> exporting to image 5.7s
=> => exporting layers
When i relaunch the docker build
command even if i modify the uselessfile.txt
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 37B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/node:latest 0.6s
=> [ 1/11] FROM docker.io/library/node@sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d 0.0s
=> [internal] load build context 3.1s
=> => transferring context: 3.21MB 2.6s
=> CACHED [ 2/11] WORKDIR /app 0.0s
=> CACHED [ 3/11] COPY package*.json /app/ 0.0s
=> CACHED [ 4/11] RUN apt-get update && apt-get install -y vim 0.0s
=> CACHED [ 5/11] RUN npm install 0.0s
=> CACHED [ 6/11] COPY ./node_modules ./node_modules 0.0s
=> CACHED [ 7/11] COPY ./public ./public 0.0s
=> CACHED [ 8/11] COPY ./src ./src 0.0s
=> CACHED [ 9/11] COPY ./build ./build 0.0s
=> CACHED [10/11] RUN npm run build 0.0s
=> [11/11] COPY ./uselessfile.txt ./veryuseless.txt 0.3s
=> exporting to image 0.0s
=> => exporting layers
Conclusions
Every layer depends on the preceding, and when you do as you do (good for certain production scopes), using COPY . .
, any change on any file recreates a lot of layers.
Anyway, i suggest you to do a dockerfile for dev and one for production and keep it updated and similar.
For example in the dev one, you can copy the node_modules so you can skip the npm install
command, or a mix of all these things. If you do a lot of editings on the src
folder or App.js
file, you don't have the need of recreate the node_modules every time on the docker build.
Moreover, you could edit the file directly on the container with vim to speed up you development.
Hope, it clearify!
Cose belle!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论