集群自动缩放器 Pod 崩溃超时 sts.us-west-1.amazonaws.com

huangapple go评论57阅读模式
英文:

Cluster auto scaler pod crashing timeout sts.us-west-1.amazonaws.com

问题

我正在按照这个文档来部署EKS中的集群自动缩放器:https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html

EKS版本是1.24。集群。在开放的互联网上允许公共流量,并且我们已经在Squid代理中将.amazonaws.com域列入白名单。

我觉得角色或策略配置可能出了问题

Pod中的错误:

> F0208 05:39:52.442470 1 aws_cloud_provider.go:386] 生成AWS EC2实例类型失败:WebIdentityErr:未能检索凭证,原因是:RequestError:发送请求失败,原因是:Post "https://sts.us-west-1.amazonaws.com/": 拨号tcp 176.32.112.54:443:i/o超时

服务帐户已经有了注释,以使用IAM角色

Kubectl描述了cluster-autoscaler服务帐户

名称:cluster-autoscaler
命名空间:kube-system
标签:k8s-addon=cluster-autoscaler.addons.k8s.io
k8s-app=cluster-autoscaler
注释:eks.amazonaws.com/role-arn: arn:aws:iam:::role/irsa-clusterautoscaler
Image pull secrets:
Mountable secrets:
Tokens:
事件:

英文:

I am following this document to deploy cluster auto scaler in EKS https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html

EKS Version is 1.24. Cluster. Public traffic is allowed on the open internet and we have whitelisted the .amazonaws.com domain in the squid proxy.

I feel there might be something wrong with the role or policy configuration

Error in pod:

> F0208 05:39:52.442470 1 aws_cloud_provider.go:386] Failed to
> generate AWS EC2 Instance Types: WebIdentityErr: failed to retrieve
> credentials caused by: RequestError: send request failed caused by:
> Post "https://sts.us-west-1.amazonaws.com/": dial tcp
> 176.32.112.54:443: i/o timeout

The service account has the annotation in place to make use of the IAM role

Kubectl describes cluster-autoscaler service account

Name:                cluster-autoscaler
Namespace:           kube-system
Labels:              k8s-addon=cluster-autoscaler.addons.k8s.io
                     k8s-app=cluster-autoscaler
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::<ID>:role/irsa-clusterautoscaler
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>

答案1

得分: 1

通过在部署的容器环境中添加代理详细信息来解决了这个问题。这在实际文档中是缺失的,他们可以将其添加为提示。Pod 没有使用节点中可用的代理设置,而是期望这些设置已配置。

英文:

It was solved by adding the proxy details on the container env of the deployment. Which is missing in the actual documentation, they could add it as a hint. Pod was not taking the proxy setting available in the node, it was expecting it to be configured.

huangapple
  • 本文由 发表于 2023年2月8日 17:28:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75383669.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定