英文:
Kaniko build image job inside kubernetes container gets OOMKilled when git source context
问题
我在 Kubernetes 内的容器中使用 Kaniko 构建图像时遇到了一个问题。在运行构建作业时,当从远程 Git 存储库获取源上下文时,构建作业会因 OOMKilled 而失败。我使用了最新版本的 Kaniko 执行器映像(gcr.io/kaniko-project/executor:latest),我的工作节点有 8GB RAM。
我的图像的 Dockerfile 位于远程 Git 存储库中,我正在使用以下构建参数:
我使用了以下构建参数:
f"--dockerfile=/images/Containerfile",
"--context=git://gitRepo.git#refs/heads/main",
f"--cache={False}",
"--verbosity=debug",
f"--cache-copy-layers={False}",
f"--compressed-caching={False}",
"--use-new-run",
f"--destination=mydestination"
#f" 一堆构建参数"
在运行构建作业时,我看到以下日志:
DEBU[0000] Getting source context from git://repo.git#refs/heads/main
DEBU[0000] Getting source from reference
Enumerating objects: 944, done.
Counting objects: 100% (879/879), done.
Compressing objects: 100% (464/464), done.
在 Kaniko 从远程 Git 存储库获取源上下文时,构建作业以 OOMKilled 错误退出。不久前我能够正常构建,但在我向同一存储库/源上下文中添加了一个大约 2Gi 的 SQL 文件后,就出现了这个错误。即使我删除了这个大文件,仍然会出现这个错误。现在无论使用哪个版本的 Kaniko,我都会收到这个错误。
我觉得这个错误与缓存有关,我尝试将 compressed_caching 设置为 False,如一些问题2491、1333中建议的那样。我在访问存储库时没有问题,所有权限都有效,问题出在下载上下文时。值得注意的是,当使用 16Gi 的节点运行此容器时,它有 50% 的几率能够正常工作。我检查了在它正常工作时的使用情况,只有最初使用了接近 12 到 15 Gi 内存,而在构建完成前的其余实际构建过程中,它使用了 2Gi 内存。
非常感谢您对如何解决这个问题的任何建议。
英文:
I am building an image inside kubernetes in a container using kaniko. When running the build job I run into an issue where the build job gets OOMKilled when fetching the source context from a remote git repository. I am using the latest version of the kaniko executor image (gcr.io/kaniko-project/executor:latest) and my worker node has 8GB of RAM.
The Dockerfile for my image is located in a remote git repository and I am using the following build arguments:
I've used the following build arguments:
f"--dockerfile=/images/Containerfile",
"--context=git://gitRepo.git#refs/heads/main",
f"--cache={False}",
"--verbosity=debug",
f"--cache-copy-layers={False}",
f"--compressed-caching={False}",
"--use-new-run",
f"--destination=mydestination"
#f" bunch of build args"
When running the build job, I see the following logs:
DEBU[0000] Getting source context from git://repo.git#refs/heads/main
DEBU[0000] Getting source from reference
Enumerating objects: 944, done.
Counting objects: 100% (879/879), done.
Compressing objects: 100% (464/464), done.
The build job exits with an OOMKilled error at the point where kaniko is fetching the source context from the remote git repository. I was able to build normally not so long ago. This error started after I added a large 2Gi SQL file in the same repo/source context. I still have this error even after removing the large file. I get the error for all version's of kaniko now.
I feel like the error is related to caching and I've tried setting compressed_caching to False as suggested by various issues 2491,1333. I don't have an issue accessing the repo as all permissions work, the issue is while downloading the context. A point to note is that when using a 16Gi node to run this container it works 50% of the time. An I checked the usage when it worked, only initially does it use close to 12 to 15 Gi memory and rest of the actual build (till finishing the build) it uses 2Gi memory.
Any suggestions on how to resolve this issue would be greatly appreciated.
答案1
得分: 1
简短版本:
我最终使用了一个不到100MB大小的不同的Git仓库源上下文,而不是原始的超过2GB大小的Git上下文。
详细版本:
问题出现在将大型SQL文件添加到原始的Git源上下文后。Kaniko在使用12GB以上内存时出现问题。在K8s集群中使用16GB内存实例的情况下,工作了50%的时间。我自然而然地将较大的文件从源上下文中移除,以知道这会解决问题。
但即使在从仓库/源上下文中移除大文件之后,问题仍未解决。这使我相信这可能是一个缓存问题。于是我决定根据评论中的建议将缓存和压缩缓存设置为false。然而,即使禁用了缓存,问题仍然存在。我可能错了,但我相信仓库本身可能出了问题。
我切换到了一个不同的Git源上下文,其中只包含Kaniko需要使用的最基本文件(Dockerfile、Nginx配置文件等),并将仓库大小减小到不到100MB,这样就解决了问题!
我仍然不知道为什么Kaniko要使用大量内存来克隆文件。我仍在调查中。一旦我找到确切的原因,可能在下周我会在这里发布。
英文:
Short Version:
I ended up using a different git repo source context with less than 100MB size instead of the original git context with more than 2 Gi size.
Longer Version:
The issue started right after adding the large SQL files to the original git source context. Kaniko was acting up using 12+ Gi memory. Using a 16Gi memory instance in the k8s cluster worked 50% of the time. Naturally, I removed the larger files from the source context knowing that would fix it.
But even after removing the large files from the repository/source_context this problem was not resolved. This led me to believe that there was a caching problem. Which is when I decided to set caching and compressed-caching to be false as mentioned in the comments. However, even when disabling caching the issue persisted. I may be wrong but, I believe that somehow there was an issue with the repository itself.
I switched to a different git source context which only had the most essential files to be used by kaniko (dockerfile, nginx config files, etc.) and reduced the repository size to less than 100MB and this worked!
I still don't have the exact reason why kaniko was using a lot of memory to clone the files. I am still investigating that. I'll post it here once I find out probably by next week.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论