在自动驾驶集群中面临扩展问题

huangapple go评论68阅读模式
英文:

Facing scaling issues in autopilot cluster

问题

I'm facing scaling issues in GKE Autopilot cluster. Getting error:

node scale up failed: pod is at risk of not being scheduled

I am facing this issue only with Autopilot cluster

Try to do basic troubleshooting steps but issue not sorted out

英文:

I'm facing scaling issues in gke autopilot cluster. Getting error:

> node scale up failed:pod is at risk of not being scheduled

I am facing this issue only with autopilot cluster

Try to do basic troubleshooting steps but issue not at sort out

答案1

得分: 0

根据这份官方文档,提到的问题发生在你的Google Cloud项目中禁用了串行端口日志记录时。GKE Autopilot集群需要串行端口日志记录来有效调试节点问题。如果禁用了串行端口日志记录,Autopilot将无法提供节点来运行你的工作负载。

可能是组织级别禁用了串行端口日志记录
通过强制执行compute.disableSerialPortLogging约束的组织策略。串行端口日志记录也可能在项目或虚拟机(VM)实例级别禁用。

解决此问题,请执行以下操作

  1. 请联系你的Google Cloud组织策略管理员,在具有你的Autopilot集群的项目中移除compute.disableSerialPortLogging约束
  2. 如果你没有强制执行此约束的组织策略,请尝试在你的项目元数据中启用串行端口日志记录。此操作需要compute.projects.setCommonInstanceMetadata IAM权限。

更多关于Autopilot集群故障排除的信息,请参考这份文档

英文:

As per this official doc, the mentioned issue occurs when serial port logging is disabled in your Google Cloud project. GKE Autopilot clusters require serial port logging to effectively debug node issues. If serial port logging is disabled, Autopilot can't provision nodes to run your workloads.

> Serial port logging might be disabled at the organization level
> through an organization policy that enforces the
> compute.disableSerialPortLogging constraint. Serial port logging could
> also be disabled at the project or virtual machine (VM) instance
> level.
>
> To resolve this issue do the following:
>
> 1. Ask your Google Cloud organization policy administrator to remove the compute.disableSerialPortLogging constraint in the project
> with your Autopilot cluster.
> 2. If you don't have an organization policy that enforces this constraint, try to enable serial port logging in your project
> metadata.
This action requires the
> compute.projects.setCommonInstanceMetadata IAM permission.
>
>
Refer to this doc for more information about Troubleshooting
Autopilot clusters

huangapple
  • 本文由 发表于 2023年7月6日 14:56:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76626201.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定