英文:
How do I safely change the instance type of my EKS worker nodes?
问题
我使用Terraform EKS模块创建了一个EKS集群。当我超出可用的Pod数量时,自动缩放生效并增加了一个额外的实例,这是预期的。
然后我决定更改实例类型,因为我需要更多的CPU导向型实例。我在模块中更改了实例类型并应用了更改。
我期望Terraform首先创建新的节点组及其工作节点,然后将旧的工作节点迁移到新的工作节点上以保持Pod的正常运行。但实际发生的是所有工作节点都被销毁,然后重新创建,导致了几分钟的停机时间。
下次如何通过Terraform更改实例类型以保持Pod的运行?
英文:
I have an EKS cluster provisioned with a Terraform EKS module. When I exceeded the available pod count, the autoscaling kicked in and gave me an extra instance, which was expected.
Then I decided to change the instance type as I needed a more CPU oriented one. I changed the instance type in the module and applied changes.
I expected terraform to create the new node group and its workers first, then drain the old ones onto the new ones to keep the pods up, but what happened was that all the workers got destroyed and then provisioned again causing a few minutes of downtime.
How would I go about changing the instance type via Terraform next time to keep the pods running?
答案1
得分: 2
由于节点组实例类型是不可变的(正如在此Stack Overflow 答案中提到的),Terraform 可能会删除节点组并重新创建它,从而在过程中删除它们正在运行的所有 pod。
为了避免这种情况,您需要执行以下操作:
- 添加一个新的节点组,使用新的实例类型。
- 逐渐封锁和排空旧节点组中的所有节点,并确保它们按预期安排在新节点组上。
- 一旦所有 pod 都安排在新节点组上,删除旧节点组。
英文:
Since node groups instance types are immutable (as mentioned in this SO answer) Terraform is probably deleting the node group and recreating it, deleting all of the pods they are running in the process.
In order to avoid it, you will have to -
- Add a new node groups with the new instance type
- Cordon and drain all nodes in the old node groups gradually and make sure they are scheduled as expected on the new node groups
- Once all pods are scheduled on the new node groups, delete the old one.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论