2023年6月22日 15:59:23go评论156阅读模式

英文:

k8s, without information about CPU and Memory

问题

我在使用 iguazio/mlrun 解决方案中的 igztop 检查运行中的 pod 时，发现 CPU 和内存的值为空。请参见此 pod 的输出中的第一行 *m6vd9：

[ jist @ iguazio-system 07:41:43 ]->(0) ~ $ igztop -s cpu
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| NAME                                                         | CPU(m) | MEMORY(Mi) | NODE      | STATUS  | MLRun Proj. | MLRun Owner |
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| xxxxxxxxxxxxxxxx7445dfc774-m6vd9                             |        |            | k8s-node3 | Running |             |             |
| xxxxxx-jupyter-55b565cc78-7bjfn                              | 27     | 480        | k8s-node1 | Running |             |             |
| nuclio-xxxxxxxxxxxxxxxxxxxxxxxxxx-756fcb7f74-h6ttk           | 15     | 246        | k8s-node3 | Running |             |             |
| mlrun-db-7bc6bcf796-64nz7                                    | 13     | 717        | k8s-node2 | Running |             |             |
| xxxx-jupyter-c4cccdbd8-slhlx                                 | 10     | 79         | k8s-node1 | Running |             |             |
| v3io-webapi-scj4h                                            | 8      | 1817       | k8s-node2 | Running |             |             |
| v3io-webapi-56g4d                                            | 8      | 1827       | k8s-node1 | Running |             |             |
| spark-worker-8d877878c-ts2t7                                 | 8      | 431        | k8s-node1 | Running |             |             |
| provazio-controller-644f5784bf-htcdk                         | 8      | 34         | k8s-node1 | Running |             |             |
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+

并且在 Grafana 中也无法查看此 pod 的性能指标（CPU、内存、I/O）。

你知道如何解决此问题，而不需要重启整个节点吗？这个问题的根本原因是什么？

英文:

I got empty values for CPU and Memory, when I used igztop for check running pods in iguazio/mlrun solution. See the first line in output for this pod *m6vd9:

[ jist @ iguazio-system 07:41:43 ]-&gt;(0) ~ $ igztop -s cpu
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| NAME                                                         | CPU(m) | MEMORY(Mi) | NODE      | STATUS  | MLRun Proj. | MLRun Owner |
+--------------------------------------------------------------+--------+------------+-----------+---------+-------------+-------------+
| xxxxxxxxxxxxxxxx7445dfc774-m6vd9                             |        |            | k8s-node3 | Running |             |             |
| xxxxxx-jupyter-55b565cc78-7bjfn                              | 27     | 480        | k8s-node1 | Running |             |             |
| nuclio-xxxxxxxxxxxxxxxxxxxxxxxxxx-756fcb7f74-h6ttk           | 15     | 246        | k8s-node3 | Running |             |             |
| mlrun-db-7bc6bcf796-64nz7                                    | 13     | 717        | k8s-node2 | Running |             |             |
| xxxx-jupyter-c4cccdbd8-slhlx                                 | 10     | 79         | k8s-node1 | Running |             |             |
| v3io-webapi-scj4h                                            | 8      | 1817       | k8s-node2 | Running |             |             |
| v3io-webapi-56g4d                                            | 8      | 1827       | k8s-node1 | Running |             |             |
| spark-worker-8d877878c-ts2t7                                 | 8      | 431        | k8s-node1 | Running |             |             |
| provazio-controller-644f5784bf-htcdk                         | 8      | 34         | k8s-node1 | Running |             |             |

and It also was not possible to see performance metrics (CPU, Memory, I/O) for this pod in Grafana.

Do you know, how can I resolve this issue without whole node restart (and what is the root cause)?

答案1

得分: 1

以下故障排除步骤将帮助您解决问题：

使用描述命令检查是否可以查看 Pod 的 CPU 和内存：
```
kubectl describe pods my-pod
```
使用以下命令检查是否可以查看所有 Pod 和节点的 CPU 和内存：
```
kubectl top pod
kubectl top node
```

使用以下命令检查度量服务器是否正在运行：

kubectl get apiservices v1beta1.metrics.k8s.io
kubectl get pod -n kube-system -l k8s-app=metrics-server

使用以下查询检查 Pod 的 CPU 和内存：

每个 Pod 的 CPU 利用率：

sum(irate(container_cpu_usage_seconds_total{container!="POD", container=~".+"}[2m])) by (pod)

每个 Pod 的 RAM 使用情况：

sum(container_memory_usage_bytes{container!="POD", container=~".+"}) by (pod)

如果发现任何错误，请检查 Pod 和节点的日志，并附上这些日志以便进一步排除故障。

英文:

Below troubleshooting steps will help you in resolving the issue:

1.Check if you can see the CPU and memory of the pod using describe command:

kubectl describe pods my-pod

2.Check if you can view CPU and memory of all pods and nodes using below commands:

kubectl top pod 

kubectl top node

3.Check if the metric server is running by using below command:

kubectl get apiservices v1beta1.metrics.k8s.io
kubectl get pod -n kube-system -l k8s-app=metrics-server

4.Check the CPU and memory of the pod using below queries:

> CPU Utilisation Per Pod:
>
> sum(irate(container_cpu_usage_seconds_total{container!="POD", container=~".+"}[2m])) by (pod)
>
> RAM Usage Per Pod:
>
> sum(container_memory_usage_bytes{container!="POD", container=~".+"}) by (pod)

5.Check logs of the pod and node, if you find any error attach those logs for further troubleshooting.

答案2

得分: 0

似乎是与 kubelet 相关的问题，最好的方法是按照下面的逐步场景进行操作（请参阅 pdf 中的图表）：

英文:

It seems as the issue with kubelet, the best is to follow the next step by step scenario (see diagram in pdf)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

k8s，没有关于CPU和内存的信息

问题

答案1

答案2

K8s Nginx Ingress 控制器 RewriteRule

Kubernetes 持久卷（PVs）和持久卷声明（PVCs） – 多个 Pod 访问

Kubernetes undefined noderesources.preFilterState

Dynamic node pools in digitalocean kubernetes with terraform

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论