如何在AKS上使用LRS磁盘配置Zone-Affinity Pod调度的StatefulSets?

huangapple go评论61阅读模式
英文:

How can I configuring StatefulSets for Zone-Affinity Pod scheduling with LRS disks on AKS?

问题

我继承了一个位于瑞士北部运行的AKS集群。该区域不提供ZRS管理的磁盘,只提供LRS。切换到ReadWriteMany(Azure文件)不是一个选择。

我只有一个系统节点池,跨三个可用性区域。此外,我有一个自定义存储类,允许动态块存储的配置。接下来,我有一个定义持久卷声明模板的有状态副本集。

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: true
  name: my-block-sc
parameters:
  cachingmode: ReadOnly
  diskEncryptionSetID: ...
  diskEncryptionType: EncryptionAtRestWithCustomerKey
  networkAccessPolicy: DenyAll
  skuName: StandardSSD_LRS
provisioned: disk.csi.azure.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstCustomer

现在,不时地,Pod会卡在等待状态。这是因为默认调度程序试图在与PV(LRS磁盘)不在同一区域的节点上创建Pod。

当然,我可以配置节点亲和性并将所有Pod绑定到单个区域。但那样我就无法从高可用性和Pod跨区域分布中获益。

那么,我如何配置有状态集,以便在Pod崩溃或重新启动后,Pod会再次在相同的区域上调度?

是否有一种动态方式来为Pod模板规范提供节点亲和性?

英文:

I inherited an AKS cluster running in Switzerland north. This region doesn't provide ZRS-managed disk, only LRS. Switching to ReadWriteMany (Azure File) is not an option.

I have one system node pool in all (three) availability zones. Also, I have a custom storage class that allows for dynamic block storage provisioning. Next, I have a stateful set defining a persistent volume claim template.

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: true
  name: my-block-sc
parameters:
  cachingmode: ReadOnly
  diskEncryptionSetID: ...
  diskEncryptionType: EncryptionAtRestWithCustomerKey
  networkAccessPolicy: DenyAll
  skuName: StandardSSD_LRS
provisioned: disk.csi.azure.com
reclaimPolicy: Retain 
volumeBindingMode: WaitForFirstCustomer

Now, from time to time, pods get stuck in a pending state. This is because the default scheduler tries to create a pod on a node, not in the same zone as the PV (LRS disk).

Of course, I could configure a node affinity and bind all pods to a single zone. But then I can't profit from HA and pods being spread across zones.

So, how can I configure a stateful set so that, after a crash or restart of a pod, the pod gets scheduled again in the same zone?

Is there some dynamic way of providing a node affinity to a pod template spec?

答案1

得分: 1

我遇到了类似的问题,这篇帖子帮助了我。我希望它也能帮助你。 链接在这里

基本上,你需要确保 PVC 在 PV 的 ClaimRef 中正确定义。然后,你要检查在你的 pod StatefulSet 或你用于部署 pod 的任何内容中,PVC 是否被正确定义。你可以参考这里的绑定部分以获取有关 ClaimRef 的更多信息。 持久卷

这应该确保 pod 在重新启动时部署到相同的可用区。如果经过所有这些步骤仍然出现问题,可能是节点根本没有足够的空间来部署你的 pod,所以它一直处于挂起状态。如果是这种情况,那么你可能需要考虑实施优先级和抢占策略,以驱逐优先级较低的 pod 以腾出空间。另一个解决方案是垂直扩展节点,以容纳更多的 pod。 优先级和抢占的参考资料

希望这有所帮助!

英文:

I am experiencing a similar issue and this post helped me. I hope it can help you. Link here

Essentially, you want to make sure the PVC is defined correctly in the ClaimRef of your PV. Then, you want to check that the PVC is defined correctly in your pod StatefulSet or whatever you use to deploy your pods. You can refer the binding section of this for more info on ClaimRef. Persistent Volumes

This should ensure that the pod will deploy to the same availability zone on restarts. If after all that you are still having issues, it could be that the node simply does not have any more room to deploy your pod so it is stuck pending. If this is the case, then you might want to consider implementing priority and a preemption policy that can evict lower priority pods to make room. Other solution would be to vertically scale your node so it can accommodate more pods. Reference for priority and preemption.

I hope this helps!

huangapple
  • 本文由 发表于 2023年6月5日 21:49:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407101.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定