我的GKE Pod停止,出现错误”未指定命令:CreateContainerError”。

huangapple go评论64阅读模式
英文:

My GKE pods stoped with error "no command specified: CreateContainerError"

问题

一切都很正常,节点运行了数月,但突然一些 Pod 停止并出现错误 我的GKE Pod停止,出现错误”未指定命令:CreateContainerError”。

我尝试删除了 Pod 和节点,但问题依然存在。

英文:

Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error 我的GKE Pod停止,出现错误”未指定命令:CreateContainerError”。

I tried to delete pods and nodes but same issues.

答案1

得分: 1

尝试以下可能的解决方法来解决您的问题:

解决方案 1:

检查 Dockerfile 中的格式错误字符并导致崩溃。

当遇到 CreateContainerError 时,请检查用于构建容器映像的 Dockerfile 中是否有有效的 ENTRYPOINT。但是,如果您无法访问 Dockerfile,您可以通过在对象的 command 属性中使用有效的命令来配置您的 pod 对象。

因此,解决方法是不明确指定任何 workerConfig,这使得工作程序从主程序继承所有配置。

请参考容器运行时故障排除,类似的SO1SO2以及查看此类似的github链接以获取更多信息。

解决方案 2:

Kubectl describe pod podname 命令提供了有关提供 Kubernetes 基础设施的每个 pod 的详细信息。借助此命令,您可以检查线索,如果出现 Insufficient CPU,请按照下面的解决方案进行操作。

解决方法是要么:

  1. 升级引导磁盘:如果使用 pd-standard 磁盘,建议升级到 pd-balanced 或 pd-ssd。

  2. 增加磁盘大小。

  3. 使用具有更多 CPU 内核的机器类型的节点池。

有关更多信息,请参阅调整工作程序、调度程序、触发器和 Web 服务器的规模和性能参数

如果问题仍然存在,然后您可以手动升级集群的 GKE 版本,升级到其中一个已修复的版本。

还要检查您是否在过去一年中是否已更新到使用 GKE v1.26 插件中的新kubectl身份验证方式

解决方案 3:

如果您在 GitLab 上有一个部署图像到 GKE 集群的流水线,请检查处理流水线作业的 Gitlab runner 的版本。

因为事实证明,通过旧版本的 Gitlab runner 构建的每个映像都会在容器启动时引起此问题。简单地禁用它们,只允许在池中运行最新版本的 Gitlab runner,重新运行所有流水线。

检查使用旧的docker镜像的gitlab CI脚本,比如 docker:19.03.5-dind,更新到 docker:dind 有助于重新启动Kubernetes中的Pod。

英文:

Try below possible solutions to resolve your issue:

Solution 1 :

Check a malformed character in your Dockerfile and cause it to crash.

When you encounter CreateContainerError is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.

So workaround is to not specify any workerConfig explicitly which makes the workers inherit all configs from the master.

Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.

Solution 2 :

Kubectl describe pod podname command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU follows the solution below.

The solution is to either:

1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.

2)Increase the disk size.

3)Use node pool with machine type with more CPU cores.

See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.

If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.

Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?

Solution 3 :

If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .

Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.

Check the gitlab CI script using an old docker image like docker:19.03.5-dind, update to docker:dind helps the kubernetes to start the pod again.

huangapple
  • 本文由 发表于 2023年2月8日 09:00:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75380456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定