GKE关于gke-metrics-agent和UAS的日志错误

huangapple go评论102阅读模式
英文:

GKE log errors about gke-metrics-agent and UAS

问题

I'm using a private GKE cluster (Version 1.23.14-gke.1800). I have the following errors in kube-system gke-metrics-agent pod logs:

**error uasexporter/exporter.go:190 Error exporting metrics to UAS {"kind": "exporter", "name": "uas", "error": "reading from stream failed: rpc error: code = PermissionDenied desc = The caller does not have permission"}

error uasexporter/exporter.go:226 failed to get response from UAS {"kind": "exporter", "name": "uas", "error": "rpc error: code = PermissionDenied desc = The caller does not have permission"}
**

app gke-metrics-agent

component gke-metrics-agent

container gke-metrics-agent

filename /var/log/pods/kube-system_gke-metrics-agent-9rbfv_6896b214-31d2-43bb-b15d-a8e1b122d41d/gke-metrics-agent/0.log

job kube-system/gke-metrics-agent

namespace kube-system

node_name gke-gke-production-production-88f13984-h83x

pod gke-metrics-agent-9rbfv

stream stderr

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2022-12-07T10:20:55Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  namespace: kube-system
  resourceVersion: "444"
  uid: ...
secrets: ..
- name: gke-metrics-agent-token-6zhvq

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2022-12-07T10:20:56Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  resourceVersion: "452"
  uid: ...
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gke-metrics-agent
subjects:
- kind: ServiceAccount
  name: gke-metrics-agent
  namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2022-12-07T10:20:56Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  resourceVersion: "67979037"
  uid: ...
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - list
  - watch
- apiGroups:
  - policy
  resourceNames:
  - gce.gke-metrics-agent
  resources:
  - podsecuritypolicies
  verbs:
  - use

I think gke-metrics-agent is offical deamonset coming automatically in GKE.
It's obvious that is some permission problem, but I don't even know what UAS means.
I can't find any meaningful information in GCP documentation or Internet.
I tried to grant some additional cluster roles (system:gke-uas-metrics-reader, external-metrics-reader) on current gke-metrics-agent service account, but the problem still persists.

From time to time I'm also detecting following problems in my cluster:
Kubernetes aggregated API v1beta1.metrics.k8s.io/default is reporting errors
Kubernetes aggregated API v1beta1.metrics.k8s.io/default has been only 75% available over the last 10m
I think they are connected with this issue.

I will be very thankful if someone give me at least some directions.
Thank you for your time and excuse my English!

英文:

I'm using a private GKE cluster (Version 1.23.14-gke.1800). I have the following errors in kube-system gke-metrics-agent pod logs:

**error uasexporter/exporter.go:190 Error exporting metrics to UAS {"kind": "exporter", "name": "uas", "error": "reading from stream failed: rpc error: code = PermissionDenied desc = The caller does not have permission"}

error uasexporter/exporter.go:226 failed to get response from UAS {"kind": "exporter", "name": "uas", "error": "rpc error: code = PermissionDenied desc = The caller does not have permission"}
**

app gke-metrics-agent

component gke-metrics-agent

container gke-metrics-agent

filename /var/log/pods/kube-system_gke-metrics-agent-9rbfv_6896b214-31d2-43bb-b15d-a8e1b122d41d/gke-metrics-agent/0.log

job kube-system/gke-metrics-agent

namespace kube-system

node_name gke-gke-production-production-88f13984-h83x

pod gke-metrics-agent-9rbfv

stream stderr

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2022-12-07T10:20:55Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  namespace: kube-system
  resourceVersion: "444"
  uid: ...
secrets: ..
- name: gke-metrics-agent-token-6zhvq

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2022-12-07T10:20:56Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  resourceVersion: "452"
  uid: ...
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gke-metrics-agent
subjects:
- kind: ServiceAccount
  name: gke-metrics-agent
  namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2022-12-07T10:20:56Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  resourceVersion: "67979037"
  uid: ...
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - list
  - watch
- apiGroups:
  - policy
  resourceNames:
  - gce.gke-metrics-agent
  resources:
  - podsecuritypolicies
  verbs:
  - use

I think gke-metrics-agent is offical deamonset coming automatically in GKE.
It's obvious that is some permission problem, but I don't even know what UAS means.
I can't find any meaningful information in GCP documentation or Internet.
I tried to grant some additional cluster roles (system:gke-uas-metrics-reader, external-metrics-reader) on current gke-metrics-agent service account, but the problem still persists.

From time to time I'm also detecting following problems in my cluster:
Kubernetes aggregated API v1beta1.metrics.k8s.io/default is reporting errors
Kubernetes aggregated API v1beta1.metrics.k8s.io/default has been only 75% available over the last 10m
I think they are connected with this issue.

I will be very thankful if someone give me at least some directions.
Thank you for your time and excuse my English!

答案1

得分: 0

UAS 代表统一自动缩放平台,为自动缩放器后端提供预测和定时大小建议,它为区域自动缩放器提供额外的信号,用于预测自动缩放和定时自动缩放。

目前存在一个与 UAS 相关的已知问题。这是由于一个与 LoggingMonitorConfig 相关的问题,Google 正在解决。有关该问题的进一步更新,请关注上述链接。在上述链接中发表评论,并询问是否有临时解决方法。

如果您发现了与 Google 产品相关的任何问题,并希望提出功能请求,请使用链接公共问题跟踪器

英文:

UAS stands for Unified Autoscaling Platform and provides predictive and scheduled size recommendations to Autoscaler backend, it provides additional signal to zonal Autoscaler for Predictive Autoscaling and Scheduled Autoscaling

Currently there is a known issue which is related to the UAS. This is occurring due to a LoggingMonitorConfig issue which Google is working on. For further updates on the issue follow the above link. Post a comment in the above link and ask them to do a workaround if any for now.

If you find any issue with Google products and want to raise a feature request use the link Public Issue Tracker.

huangapple
  • 本文由 发表于 2023年2月16日 16:34:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75469611.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定