英文:
My GKE pods stoped with error "no command specified: CreateContainerError"
问题
一切都很正常,节点运行了数月,但突然一些 Pod 停止并出现错误
我尝试删除了 Pod 和节点,但问题依然存在。
英文:
Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error
I tried to delete pods and nodes but same issues.
答案1
得分: 1
尝试以下可能的解决方法来解决您的问题:
解决方案 1:
检查 Dockerfile 中的格式错误字符并导致崩溃。
当遇到 CreateContainerError
时,请检查用于构建容器映像的 Dockerfile 中是否有有效的 ENTRYPOINT。但是,如果您无法访问 Dockerfile,您可以通过在对象的 command 属性中使用有效的命令来配置您的 pod 对象。
因此,解决方法是不明确指定任何 workerConfig
,这使得工作程序从主程序继承所有配置。
请参考容器运行时故障排除,类似的SO1,SO2以及查看此类似的github链接以获取更多信息。
解决方案 2:
Kubectl describe pod podname
命令提供了有关提供 Kubernetes 基础设施的每个 pod 的详细信息。借助此命令,您可以检查线索,如果出现 Insufficient CPU
,请按照下面的解决方案进行操作。
解决方法是要么:
-
升级引导磁盘:如果使用 pd-standard 磁盘,建议升级到 pd-balanced 或 pd-ssd。
-
增加磁盘大小。
-
使用具有更多 CPU 内核的机器类型的节点池。
有关更多信息,请参阅调整工作程序、调度程序、触发器和 Web 服务器的规模和性能参数。
如果问题仍然存在,然后您可以手动升级集群的 GKE 版本,升级到其中一个已修复的版本。
还要检查您是否在过去一年中是否已更新到使用 GKE v1.26 插件中的新kubectl身份验证方式?
解决方案 3:
如果您在 GitLab 上有一个部署图像到 GKE 集群的流水线,请检查处理流水线作业的 Gitlab runner 的版本。
因为事实证明,通过旧版本的 Gitlab runner 构建的每个映像都会在容器启动时引起此问题。简单地禁用它们,只允许在池中运行最新版本的 Gitlab runner,重新运行所有流水线。
检查使用旧的docker镜像的gitlab CI脚本,比如 docker:19.03.5-dind
,更新到 docker:dind
有助于重新启动Kubernetes中的Pod。
英文:
Try below possible solutions to resolve your issue:
Solution 1 :
Check a malformed character in your Dockerfile and cause it to crash.
When you encounter CreateContainerErro
r is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.
So workaround is to not specify any workerConfig
explicitly which makes the workers inherit all configs from the master.
Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.
Solution 2 :
Kubectl describe pod podname
command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU
follows the solution below.
The solution is to either:
1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.
2)Increase the disk size.
3)Use node pool with machine type with more CPU cores.
See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.
If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.
Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?
Solution 3 :
If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .
Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.
Check the gitlab CI script using an old docker image like docker:19.03.5-dind
, update to docker:dind
helps the kubernetes to start the pod again.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论