英文:
Kubernetes scaledown ignoring PDB
问题
我们在GKE Autopilot中经常出现节点的缩减,导致我们的应用在短暂时间内不可用。我们有两个副本和一个PodDisruptionBudget(PDB),规定至少一个副本需要可用。我们还没有设置任何反亲和性(我会在下一步进行设置),但两个副本最终都在同一个节点上。
根据https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#does-ca-work-with-poddisruptionbudget-in-scale-down "在开始终止节点之前,CA确保分配在该节点上的PodDisruptionBudget允许至少移除一个副本。然后,它通过Pod逐出API删除节点上的所有Pod"。我理解得对吗,如果两个副本都在同一个节点上,这个条件将会得到满足,因为从技术上讲,一个副本是可以被移除的?它只是忽略了在这种情况下两个副本都会消失的事实吗?
供参考,这是我们的PDB状态:
status:
conditions:
- lastTransitionTime: "2023-07-28T16:03:34Z"
message: ""
observedGeneration: 1
reason: SufficientPods
status: "True"
type: DisruptionAllowed
currentHealthy: 2
desiredHealthy: 1
disruptionsAllowed: 1
expectedPods: 2
observedGeneration: 1
英文:
We're getting consistent node scaledowns in GKE Autopilot that makes our application unavailable for a few seconds. We have two replicas and a PDB stating that at least one needs to be available. We haven't set up any anti affinity (I'll be doing that next) and both replicas end up on the same node.
According to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#does-ca-work-with-poddisruptionbudget-in-scale-down "Before starting to terminate a node, CA makes sure that PodDisruptionBudgets for pods scheduled there allow for removing at least one replica. Then it deletes all pods from a node through the pod eviction API" Do I understand correctly that if both replicas are on the same node this condition will be met because technically one replica can be removed? It just ignores the fact that both replicas will be gone in this case?
For reference here's our PDB status
status:
conditions:
- lastTransitionTime: "2023-07-28T16:03:34Z"
message: ""
observedGeneration: 1
reason: SufficientPods
status: "True"
type: DisruptionAllowed
currentHealthy: 2
desiredHealthy: 1
disruptionsAllowed: 1
expectedPods: 2
observedGeneration: 1
答案1
得分: 1
在PDB中,我们有一种方法可以防止一个副本在节点上有两个副本时被驱逐,方法是设置目标大小,即特定类型的Pod的最小可用性。这表示至少应该始终运行一个副本。如果运行的副本数量低于目标大小,Kubernetes将阻止对其余副本的进一步干扰,直到达到目标大小
。
因此,在您的情况下,如果将目标大小设置为2,那么一个副本将被中断,另一个副本将在运行,从而使节点免受中断。当发生中断时,Kubernetes将尝试从受影响的节点中优雅地驱逐Pod,以维护PDB中指定的所需副本数。
请参考Ink Insight的博客文章,其中清楚地解释了有关Kubernetes | Pod Disruption Budgets(PDB)的内容。从该博客中,以下是在“my-namespace”中为名为“my-deployment”的部署设置目标大小为2的PDB的示例。
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: my-pdb
namespace: my-namespace
spec:
minAvailable: 2
selector:
matchLabels:
app: my-deployment
您还可以参考官方文档以指定应用程序的中断预算,以及考虑Pod调度和中断。
编辑:在更深入研究后,我了解到:
正如您在文档中提到的那样,CA确保计划在那里的PodDisruptionBudget允许删除至少一个副本。然后,通过Pod驱逐API删除节点上的所有Pod"
。
实际上,它并不会删除所有的Pod,这里PDB将检查其条件并根据定义的条件(最小可用性:1)允许一个Pod留在节点上,这将防止Pod被驱逐。
然后,正如同一文档中提到的那样,如果其中一个驱逐失败,节点将被保存,不会被终止
,这样节点将免受中断。
如果我们未正确指定PDB条件,它将自动删除所有的Pod并干扰节点。它不会忽略两个副本都会被删除的事实。
英文:
In PDB, we are having a way to prevent the one replica from getting evicted when there are two replicas from the node by setting the target size which is called minimum availability for a particular type of pod. This indicates that atleast one replica should be running at any time. If the number of running replicas is below the target size, Kubernetes will prevent further disruptions to the remaining replicas until the target size is met
.
So, in your case if you set the target size to 2 then one replica will be disrupted and another will be running which makes a node to prevent it from disruption. When a disruption occurs, Kubernetes will attempt to gracefully evict pods from the affected node(s) in order to maintain the desired number of replicas specified in the PDB.
Refer to this blog by Ink Insight which clearly explained about Kubernetes | Pod Disruption Budgets (PDB) from the blog here’s an example of a PDB that sets the target size to 2 for a deployment named “my-deployment” in the “my-namespace”.
> apiVersion: policy/v1beta1
> kind: PodDisruptionBudget
> metadata:
> name: my-pdb
> namespace: my-namespace
> spec:
> minAvailable: 2
> selector:
> matchLabels:
> app: my-deployment
You can also refer to the official doc on specifying a Disruption Budget for your Application and Considering Pod scheduling and disruption .
EDIT : After going through more on this, i understood this :
As you mentioned from the doc CA makes sure that PodDisruptionBudgets for pods scheduled there allow for removing at least one replica. Then it deletes all pods from a node through the pod eviction API"
Actually it doesn't delete all the pods, Here PDB will check its condition and allow one POD to be on the node as per the condition defined (min available : 1) this will prevent pod from eviction.
Then as mentioned in the same doc, If one of the eviction fails, the node is saved and it is not terminated
then this way the node will save from disruption.
If we don't mention PDB condition properly then it will automatically delete all the pods and disrupt the node also. It didn't ignore the fact that both replicas will go.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论