GKE Pod Migration to 2nd Nodepool blocked by PDB and 2nd Nodepool not scaling up

huangapple go评论52阅读模式
英文:

GKE Pod Migration to 2nd Nodepool blocked by PDB and 2nd Nodepool not scaling up

问题

我有一个旧的节点池,使用X型号的机器,我正在将它们的工作负载迁移到一个新的节点池,该节点池使用Y型号的机器。两个节点池都处于运行状态,并且我已在旧节点池上禁用了自动缩放。我为旧节点池添加了一个cordon,然后使用GKE文档提供的命令排空了节点:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=old-pool -o=name); do
  kubectl drain --force --ignore-daemonsets --delete-emptydir-data --grace-period=300 "$node";
done

这些节点具有最小1个可用的Pod干扰预算(PDB),新的节点池已设置自动缩放,最小为1,最大约为10。

我的问题是排空操作未能完成,它停在了无法驱逐Pod,因为这将违反Pod的干扰预算的错误信息上。这是有道理的,它不应该驱逐Pod,但与此同时,它是否应该触发在新的节点池上创建相同Pod的操作?

由于新的Pod没有在新节点池上创建,以便旧节点排空,我无法迁移我的工作负载(至少不能在没有干扰的情况下进行,而我不能容忍干扰)。

我在这里遗漏了什么?

英文:

I have an old nodepool with machine type X, and I am migrating them workload to a new nodepool of machine type Y. Both nodepools are up, and I've disabled autoscaling on the old nodepool. I added a cordon to the old nodepool, then drained the nodes using the command provided by the GKE doc:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=old-pool -o=name); do
  kubectl drain --force --ignore-daemonsets --delete-emptydir-data --grace-period=300 "$node";
done

Those have a pod disruption budget (PDB) of Min 1 available, and the new nodepool has autoscaling setup with min 1 and max around 10.

My problem is that the drain does not complete, it gets stuck saying Cannot evict pod as it would violate the pod's disruption budget.. It makes sense that it doesn't evict the pod, but at the same time, should it trigger the creation of the same pod on the new nodepool?

As new pods are not getting created on the new nodepool to let the old node drain, I cannot migrate my workload (at least not without disruption, which I cannot have).

What am I missing here?

答案1

得分: 1

日志中的“无法驱逐 Pod,因为这将违反 Pod 的中断预算”确认了根本原因是由于 Pod 的中断预算(PDB)配置。

PDB 的主要组成部分是 minavailablemaxunavailable

spec.minAvailable:这是在驱逐后必须可用的总 Pod 数。

spec.maxUnavailable:这是在驱逐期间可以不可用的最大 Pod 数。

根据官方文档,一个应用程序的 PodDisruptionBudget 也可以阻止自动扩展;如果删除节点会导致预算超出限制,集群将不会缩减。

在缩减规模时,集群自动缩放器会尊重 Pod 上设置的调度和驱逐规则。这些限制可以阻止自动缩放器删除节点。如果节点包含具有以下任何条件的 Pod,则可能会阻止删除节点:

> - Pod 的 亲和性反亲和性 规则阻止重新调度。
> - Pod 未由 Controller 管理,如 Deployment、StatefulSet、Job 或 ReplicaSet。
> - Pod 具有本地存储,并且 GKE 控制平面版本低于 1.22。在具有控制平面版本 1.22 或更高版本的 GKE 集群中,不再阻止具有本地存储的 Pod 的缩放。
> - Pod 具有 "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" 注释。
> - 节点的删除将超出配置的 PodDisruptionBudget,操作无法完成,因为有一个具有 PDB 的部署。根据 PDB 中确定的行为,您可以更改 PDBminAvailablemaxUnavailable 值。

您可以检查上述条件是否阻止了在您的部署中删除节点以及 PDB。

希望上述信息对您有用。

英文:

The log Cannot evict pod as it would violate the pod's disruption budget confirms that the root cause is due to the pod's disruption budget(PDB) configuration.

The main components of the PDB are minavailable and maxunavailable.

spec.minAvailable : this is the total number of pods that must be available after the eviction, in the absence of the evicted pod.

spec.maxUnavailable : this is the maximum number of pods that can go unavailable during an eviction.

According to the official documentation, an application's PodDisruptionBudget can also prevent autoscaling; if deleting nodes would cause the budget to be exceeded, the cluster does not scale down.

When scaling down, the cluster autoscaler respects scheduling and eviction rules set on Pods. These restrictions can prevent a node from being deleted by the autoscaler. A node's deletion could be prevented if it contains a Pod with any of these conditions:

> - The Pod's affinity or anti-affinity rules prevent rescheduling.
> - The Pod is not managed by a Controller such as a Deployment, StatefulSet, Job or ReplicaSet.
> - The Pod has local storage and the GKE control plane version is lower than 1.22. In GKE clusters with control plane version 1.22 or
> later, Pods with local storage no longer block scaling down.
> - The Pod has the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation.
> - The node's deletion would exceed the configured PodDisruptionBudget and the operation is unable to complete because of a deployment which is having PDB .From the behavior
> identified in the PDB you can change the minAvailable or
> maxUnavailable values of the PDB.

Can you check the above conditions which are preventing the node’s deletion in your deployment along with PDB.

Hope the mentioned information is useful to you.

huangapple
  • 本文由 发表于 2023年7月13日 17:58:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76678138.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定