英文:
k8s, without information about CPU and Memory
问题
我在使用 iguazio/mlrun 解决方案中的 igztop 检查运行中的 pod 时,发现 CPU 和内存的值为空。请参见此 pod 的输出中的第一行 *m6vd9
:
[ jist @ iguazio-system 07:41:43 ]->(0) ~ $ igztop -s cpu
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| NAME | CPU(m) | MEMORY(Mi) | NODE | STATUS | MLRun Proj. | MLRun Owner |
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| xxxxxxxxxxxxxxxx7445dfc774-m6vd9 | | | k8s-node3 | Running | | |
| xxxxxx-jupyter-55b565cc78-7bjfn | 27 | 480 | k8s-node1 | Running | | |
| nuclio-xxxxxxxxxxxxxxxxxxxxxxxxxx-756fcb7f74-h6ttk | 15 | 246 | k8s-node3 | Running | | |
| mlrun-db-7bc6bcf796-64nz7 | 13 | 717 | k8s-node2 | Running | | |
| xxxx-jupyter-c4cccdbd8-slhlx | 10 | 79 | k8s-node1 | Running | | |
| v3io-webapi-scj4h | 8 | 1817 | k8s-node2 | Running | | |
| v3io-webapi-56g4d | 8 | 1827 | k8s-node1 | Running | | |
| spark-worker-8d877878c-ts2t7 | 8 | 431 | k8s-node1 | Running | | |
| provazio-controller-644f5784bf-htcdk | 8 | 34 | k8s-node1 | Running | | |
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
并且在 Grafana 中也无法查看此 pod 的性能指标(CPU、内存、I/O)。
你知道如何解决此问题,而不需要重启整个节点吗?这个问题的根本原因是什么?
英文:
I got empty values for CPU and Memory, when I used igztop for check running pods in iguazio/mlrun solution. See the first line in output for this pod *m6vd9
:
[ jist @ iguazio-system 07:41:43 ]->(0) ~ $ igztop -s cpu
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| NAME | CPU(m) | MEMORY(Mi) | NODE | STATUS | MLRun Proj. | MLRun Owner |
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| xxxxxxxxxxxxxxxx7445dfc774-m6vd9 | | | k8s-node3 | Running | | |
| xxxxxx-jupyter-55b565cc78-7bjfn | 27 | 480 | k8s-node1 | Running | | |
| nuclio-xxxxxxxxxxxxxxxxxxxxxxxxxx-756fcb7f74-h6ttk | 15 | 246 | k8s-node3 | Running | | |
| mlrun-db-7bc6bcf796-64nz7 | 13 | 717 | k8s-node2 | Running | | |
| xxxx-jupyter-c4cccdbd8-slhlx | 10 | 79 | k8s-node1 | Running | | |
| v3io-webapi-scj4h | 8 | 1817 | k8s-node2 | Running | | |
| v3io-webapi-56g4d | 8 | 1827 | k8s-node1 | Running | | |
| spark-worker-8d877878c-ts2t7 | 8 | 431 | k8s-node1 | Running | | |
| provazio-controller-644f5784bf-htcdk | 8 | 34 | k8s-node1 | Running | | |
and It also was not possible to see performance metrics (CPU, Memory, I/O) for this pod in Grafana.
Do you know, how can I resolve this issue without whole node restart (and what is the root cause)?
答案1
得分: 1
以下故障排除步骤将帮助您解决问题:
-
使用描述命令检查是否可以查看 Pod 的 CPU 和内存:
kubectl describe pods my-pod
-
使用以下命令检查是否可以查看所有 Pod 和节点的 CPU 和内存:
kubectl top pod kubectl top node
-
使用以下命令检查度量服务器是否正在运行:
kubectl get apiservices v1beta1.metrics.k8s.io kubectl get pod -n kube-system -l k8s-app=metrics-server
-
使用以下查询检查 Pod 的 CPU 和内存:
每个 Pod 的 CPU 利用率:
sum(irate(container_cpu_usage_seconds_total{container!="POD", container=~".+"}[2m])) by (pod)
每个 Pod 的 RAM 使用情况:
sum(container_memory_usage_bytes{container!="POD", container=~".+"}) by (pod)
-
如果发现任何错误,请检查 Pod 和节点的日志,并附上这些日志以便进一步排除故障。
英文:
Below troubleshooting steps will help you in resolving the issue:
1.Check if you can see the CPU and memory of the pod using describe command:
kubectl describe pods my-pod
2.Check if you can view CPU and memory of all pods and nodes using below commands:
kubectl top pod
kubectl top node
3.Check if the metric server is running by using below command:
kubectl get apiservices v1beta1.metrics.k8s.io
kubectl get pod -n kube-system -l k8s-app=metrics-server
4.Check the CPU and memory of the pod using below queries:
> CPU Utilisation Per Pod:
>
> sum(irate(container_cpu_usage_seconds_total{container!="POD", container=~".+"}[2m])) by (pod)
>
> RAM Usage Per Pod:
>
> sum(container_memory_usage_bytes{container!="POD", container=~".+"}) by (pod)
5.Check logs of the pod and node, if you find any error attach those logs for further troubleshooting.
答案2
得分: 0
似乎是与 kubelet 相关的问题,最好的方法是按照下面的逐步场景进行操作(请参阅 pdf 中的图表):
英文:
It seems as the issue with kubelet, the best is to follow the next step by step scenario (see diagram in pdf)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论