英文:
GKE log errors about gke-metrics-agent and UAS
问题
I'm using a private GKE cluster (Version 1.23.14-gke.1800). I have the following errors in kube-system gke-metrics-agent pod logs:
**error uasexporter/exporter.go:190 Error exporting metrics to UAS {"kind": "exporter", "name": "uas", "error": "reading from stream failed: rpc error: code = PermissionDenied desc = The caller does not have permission"}
error uasexporter/exporter.go:226 failed to get response from UAS {"kind": "exporter", "name": "uas", "error": "rpc error: code = PermissionDenied desc = The caller does not have permission"}
**
app gke-metrics-agent
component gke-metrics-agent
container gke-metrics-agent
filename /var/log/pods/kube-system_gke-metrics-agent-9rbfv_6896b214-31d2-43bb-b15d-a8e1b122d41d/gke-metrics-agent/0.log
job kube-system/gke-metrics-agent
namespace kube-system
node_name gke-gke-production-production-88f13984-h83x
pod gke-metrics-agent-9rbfv
stream stderr
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: "2022-12-07T10:20:55Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
namespace: kube-system
resourceVersion: "444"
uid: ...
secrets: ..
- name: gke-metrics-agent-token-6zhvq
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: "2022-12-07T10:20:56Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
resourceVersion: "452"
uid: ...
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gke-metrics-agent
subjects:
- kind: ServiceAccount
name: gke-metrics-agent
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2022-12-07T10:20:56Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
resourceVersion: "67979037"
uid: ...
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- watch
- apiGroups:
- policy
resourceNames:
- gce.gke-metrics-agent
resources:
- podsecuritypolicies
verbs:
- use
I think gke-metrics-agent is offical deamonset coming automatically in GKE.
It's obvious that is some permission problem, but I don't even know what UAS means.
I can't find any meaningful information in GCP documentation or Internet.
I tried to grant some additional cluster roles (system:gke-uas-metrics-reader, external-metrics-reader) on current gke-metrics-agent service account, but the problem still persists.
From time to time I'm also detecting following problems in my cluster:
Kubernetes aggregated API v1beta1.metrics.k8s.io/default is reporting errors
Kubernetes aggregated API v1beta1.metrics.k8s.io/default has been only 75% available over the last 10m
I think they are connected with this issue.
I will be very thankful if someone give me at least some directions.
Thank you for your time and excuse my English!
英文:
I'm using a private GKE cluster (Version 1.23.14-gke.1800). I have the following errors in kube-system gke-metrics-agent pod logs:
**error uasexporter/exporter.go:190 Error exporting metrics to UAS {"kind": "exporter", "name": "uas", "error": "reading from stream failed: rpc error: code = PermissionDenied desc = The caller does not have permission"}
error uasexporter/exporter.go:226 failed to get response from UAS {"kind": "exporter", "name": "uas", "error": "rpc error: code = PermissionDenied desc = The caller does not have permission"}
**
app gke-metrics-agent
component gke-metrics-agent
container gke-metrics-agent
filename /var/log/pods/kube-system_gke-metrics-agent-9rbfv_6896b214-31d2-43bb-b15d-a8e1b122d41d/gke-metrics-agent/0.log
job kube-system/gke-metrics-agent
namespace kube-system
node_name gke-gke-production-production-88f13984-h83x
pod gke-metrics-agent-9rbfv
stream stderr
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: "2022-12-07T10:20:55Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
namespace: kube-system
resourceVersion: "444"
uid: ...
secrets: ..
- name: gke-metrics-agent-token-6zhvq
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: "2022-12-07T10:20:56Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
resourceVersion: "452"
uid: ...
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gke-metrics-agent
subjects:
- kind: ServiceAccount
name: gke-metrics-agent
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2022-12-07T10:20:56Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
resourceVersion: "67979037"
uid: ...
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- watch
- apiGroups:
- policy
resourceNames:
- gce.gke-metrics-agent
resources:
- podsecuritypolicies
verbs:
- use
I think gke-metrics-agent is offical deamonset coming automatically in GKE.
It's obvious that is some permission problem, but I don't even know what UAS means.
I can't find any meaningful information in GCP documentation or Internet.
I tried to grant some additional cluster roles (system:gke-uas-metrics-reader, external-metrics-reader) on current gke-metrics-agent service account, but the problem still persists.
From time to time I'm also detecting following problems in my cluster:
Kubernetes aggregated API v1beta1.metrics.k8s.io/default is reporting errors
Kubernetes aggregated API v1beta1.metrics.k8s.io/default has been only 75% available over the last 10m
I think they are connected with this issue.
I will be very thankful if someone give me at least some directions.
Thank you for your time and excuse my English!
答案1
得分: 0
UAS 代表统一自动缩放平台,为自动缩放器后端提供预测和定时大小建议,它为区域自动缩放器提供额外的信号,用于预测自动缩放和定时自动缩放。
目前存在一个与 UAS 相关的已知问题。这是由于一个与 LoggingMonitorConfig 相关的问题,Google 正在解决。有关该问题的进一步更新,请关注上述链接。在上述链接中发表评论,并询问是否有临时解决方法。
如果您发现了与 Google 产品相关的任何问题,并希望提出功能请求,请使用链接公共问题跟踪器。
英文:
UAS stands for Unified Autoscaling Platform and provides predictive and scheduled size recommendations to Autoscaler backend, it provides additional signal to zonal Autoscaler for Predictive Autoscaling and Scheduled Autoscaling
Currently there is a known issue which is related to the UAS. This is occurring due to a LoggingMonitorConfig issue which Google is working on. For further updates on the issue follow the above link. Post a comment in the above link and ask them to do a workaround if any for now.
If you find any issue with Google products and want to raise a feature request use the link Public Issue Tracker.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论