英文:
Under what circumstances will the completed pod not be recycled
问题
最近,在k8s集群上已经完成了很多pod,我怀疑这与集群资源不足有关。
我创建的nextflow任务包含多个进程,通常它们将按顺序执行,并在前一个pod完成后创建一个新的pod。但最近,在集群上提交了大量任务。在观察过程中,许多已完成的pod出现了,任务卡住了。我想知道这是否与集群资源或nextflow有关,如果这个pod一直无法重新启动,还会发生什么?
英文:
Recently, there have been a lot of completed pods on the k8s cluster, which I suspect is related to insufficient cluster resources
The nextflow task I created contains several processes, usually they will be executed sequentially, and a new pod will be created after the previous pod is completed. But recently, a large number of tasks have been submitted on the cluster. During the observation process, many completed pods appeared, and the tasks got stuck. I'm wondering if this has something to do with cluster resources or nextflow, and what else could happen if this pod keeps failing to recycle?
答案1
得分: 1
需要更多的细节,但如果您的 POD 已经完成,这意味着代码已经正确执行。
无论如何,如果您的 POD 抛出 错误 或 失败,它应该会 崩溃 而不是更改状态为 已完成。
如果在K8s中有资源问题,POD将根本无法调度,并将停留在 挂起 状态。
另一种情况可能是POD开始 崩溃 或出现 OOM(内存不足)杀死 事件,因此在这种情况下,您必须不断检查POD的状态。
英文:
Need more details however if your POD are being completed which means code is being executed properly.
In any case, if your POD throwing Error or Failed it should crash and not change status to Completed.
If there is a resource issue in K8s POD won't schedule at all and will stuck in a pending state.
Another case could be POD start crashing or get OOM kill event so in this case you have to keep checking status of POD.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论