问题

我使用Terraform EKS模块创建了一个EKS集群。当我超出可用的Pod数量时，自动缩放生效并增加了一个额外的实例，这是预期的。
然后我决定更改实例类型，因为我需要更多的CPU导向型实例。我在模块中更改了实例类型并应用了更改。

我期望Terraform首先创建新的节点组及其工作节点，然后将旧的工作节点迁移到新的工作节点上以保持Pod的正常运行。但实际发生的是所有工作节点都被销毁，然后重新创建，导致了几分钟的停机时间。

下次如何通过Terraform更改实例类型以保持Pod的运行？

英文:

I have an EKS cluster provisioned with a Terraform EKS module. When I exceeded the available pod count, the autoscaling kicked in and gave me an extra instance, which was expected.
Then I decided to change the instance type as I needed a more CPU oriented one. I changed the instance type in the module and applied changes.

I expected terraform to create the new node group and its workers first, then drain the old ones onto the new ones to keep the pods up, but what happened was that all the workers got destroyed and then provisioned again causing a few minutes of downtime.

How would I go about changing the instance type via Terraform next time to keep the pods running?

答案1

得分: 2

由于节点组实例类型是不可变的（正如在此Stack Overflow 答案中提到的），Terraform 可能会删除节点组并重新创建它，从而在过程中删除它们正在运行的所有 pod。

为了避免这种情况，您需要执行以下操作：

添加一个新的节点组，使用新的实例类型。
逐渐封锁和排空旧节点组中的所有节点，并确保它们按预期安排在新节点组上。
一旦所有 pod 都安排在新节点组上，删除旧节点组。

英文:

Since node groups instance types are immutable (as mentioned in this SO answer) Terraform is probably deleting the node group and recreating it, deleting all of the pods they are running in the process.

In order to avoid it, you will have to -

Add a new node groups with the new instance type
Cordon and drain all nodes in the old node groups gradually and make sure they are scheduled as expected on the new node groups
Once all pods are scheduled on the new node groups, delete the old one.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何安全地更改我的EKS工作节点的实例类型？

问题

答案1

自动使用Terraform扩展Azure Spring应用程序的URI。

你可以将持久卷索赔挂载到已挂载的卷上吗？

Openshift多个Pod的负载均衡是如何工作的？

istio – 连接拒绝

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。