问题

我有1个节点池中的6个节点的工作负载，以及一些服务，假设有3n + k个服务。然后我将它们缩减到n + k个，所有节点的负载从80%降至约45-50%，我希望我的服务被重新调度以减少节点的总数，但这并没有发生。为什么？我需要等更长时间吗？我需要采取其他行动吗？

英文:

I have the workload of 6 nodes in 1 node pool, and some number of services, let's say 3n + k. Then I am downscaling them to the number of n + k, and the load of all nodes reduces from 80% to approximately 45-50%, and I want my services to be reschedulled to reduce the overall number of nodes, but this does not happen. Why? Do I need to wait more? Do I need to make some other actions?

答案1

得分: 1

当负载降低到45%至50%时，GKE应自动重新安排工作负载以有效利用资源。但是，这个过程可能需要一些时间，因为Kubernetes会将节点保留在缓冲区中，预计在一定时间内会有相同数量的流量。如果预期的流量没有出现，Kubernetes应最终采取措施自行重新平衡工作负载。如果这种活动没有发生，或者节点缩减没有发生，可能是由于以下原因：

Pod中断预算：如果为工作负载设置了Pod中断预算（PDB），GKE将无法终止节点上运行的Pod，这将阻止节点排空操作的发生。PDB定义了在中断期间必须可用的最小Pod数量，并且当PDB约束未满足时，它们可以防止GKE排空节点。
使用节点存储的Pod：如果Pod正在使用节点上的本地存储，Kubernetes将避免终止该Pod以防止数据丢失。在这种情况下，工作负载不会重新安排，直到Kubernetes可以安全终止Pod而不会丢失数据。
添加安全驱逐注释：通过向部署添加驱逐注释，管理员可以确保更好地重新安排Pod，减轻由于Pod未按预期被驱逐而可能引起的问题。

apiVersion: v1
kind: Pod | Deployment
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

要解决节点缩减和工作负载重新安排的问题，您应检查并解决上述问题。如果存在Pod中断预算，请考虑调整它们以在节点排空期间提供更多灵活性。此外，如果Pod正在使用本地存储，请考虑将它们迁移到使用网络附加存储或其他允许更好工作负载重新分配的持久存储解决方案。一旦解决了这些问题，Kubernetes应能够根据Pod负载和资源利用率有效地处理节点缩减。

英文:

Once the load decreases to 45 to 50%, GKE should automatically rearrange the workloads between nodes to effectively utilize the resources. However, this process may take some time as Kubernetes keeps nodes in a buffer, expecting the same amount of traffic for a certain duration. If the expected traffic doesn't materialize, Kubernetes should eventually take action on its own to rebalance the workloads. If this activity is not happening or if node downscaling is not occurring, it could be due to the following reasons:

Pod Disruption Budget: If there is a Pod Disruption Budget (PDB) set for the workloads, GKE will not be able to kill running pods on the node, which would prevent the node drain from happening. PDBs define the minimum number of pods that must be available during a disruption, and they can prevent GKE from draining nodes when the PDB constraints are not met.
Pods Using Node Storage: If a pod is using local storage on a node, Kubernetes will avoid killing that pod to avoid data loss. In such cases, the workloads won't be rearranged until Kubernetes can safely terminate the pods without losing data.
Adding safe to evict annotation:By adding eviction annotations to deployments, administrators can ensure better pod re-arrangement, mitigating potential problems caused by pods not being evicted as expected.

> apiVersion: v1
> kind: Pod | Deployment
> metadata:
> annotations:
> cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

To resolve issues with node downscaling and workload rearrangement, you should check and address the above points. If there are Pod Disruption Budgets in place, consider adjusting them to allow for more flexibility during node drain. Additionally, if pods are using local storage, consider migrating them to use network-attached storage or other persistent storage solutions that allow for better workload redistribution. Once these issues are addressed, Kubernetes should be able to handle node downscaling based on pod load and resource utilization effectively.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

节点拉取在 GKE 服务降级后如何减少

问题

答案1

Kubernetes Helm Chart 条件检查

Kubernetes部署Pod未能创建。

GCP – GKE 独立版，如何伪装 10.0.0/8 CIDR。

如何在Kubernetes中更新配置映射？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论