2020年1月3日 18:43:02go评论178阅读模式

英文:

kubelet.service: Unit entered failed state in not ready state node error from kubernetes cluster

问题

我正在尝试在具有1个主节点和2个工作节点的Kubernetes集群中部署Spring Boot微服务。当我尝试使用命令sudo kubectl get nodes获取节点状态时，我发现其中一个工作节点处于未就绪状态。它显示状态为未就绪。

当我尝试运行以下命令来进行故障排除时，

sudo journalctl -u kubelet

我得到了如下响应：kubelet.service: Unit entered failed state 并且kubelet服务停止。以下是我运行命令sudo journalctl -u kubelet时得到的响应：

-- Logs begin at Fri 2020-01-03 04:56:18 EST, end at Fri 2020-01-03 05:32:47 EST. --
Jan 03 04:56:25 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:31 MILDEVKUB050 kubelet[970]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config
...

我尝试通过重新启动kubelet来解决问题。但是节点状态仍然没有改变，仍然是未就绪状态。

更新

当我尝试运行命令systemctl list-units --type=swap --state=active时，我得到以下响应：

docker@MILDEVKUB040:~$ systemctl list-units --type=swap --state=active
UNIT                                            LOAD   ACTIVE SUB    DESCRIPTION
dev-mapper-MILDEVDCR01\x2d\x2dvg\x2dswap_1.swap loaded active active /dev/mapper/MILDEVDCR01--vg-swap_1

重要信息

当我遇到节点不可用的问题时，每次都需要禁用交换空间并重新加载守护进程和kubelet。然后节点才能变为就绪状态。然后我需要重复相同的操作。

如何找到这个问题的永久解决方案？

英文:

I am trying to deploy an springboot microservices in kubernetes cluster having 1 master and 2 worker node. When I am trying to get the node state using the command sudo kubectl get nodes, I am getting one of my worker node is not ready. It showing not ready in status.

When I am applying to troubleshoot the following command,

sudo journalctl -u kubelet

I am getting response like kubelet.service: Unit entered failed state and kubelet service stopped. The following is the response what I am getting when applying the command sudo journalctl -u kubelet.

-- Logs begin at Fri 2020-01-03 04:56:18 EST, end at Fri 2020-01-03 05:32:47 EST. --
Jan 03 04:56:25 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:31 MILDEVKUB050 kubelet[970]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet&#39;s --confi
Jan 03 04:56:31 MILDEVKUB050 kubelet[970]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet&#39;s --confi
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.053962     970 server.go:416] Version: v1.17.0
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.084061     970 plugins.go:100] No cloud provider specified.
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.235928     970 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.280173     970 certificate_store.go:129] Loading cert/key pair from &quot;/var/lib/kubelet/pki/kubelet-client-curre
Jan 03 04:56:38 MILDEVKUB050 kubelet[970]: I0103 04:56:38.107966     970 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Jan 03 04:56:38 MILDEVKUB050 kubelet[970]: F0103 04:56:38.109401     970 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable swa
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result &#39;exit-code&#39;.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet&#39;s --conf
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet&#39;s --conf
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.901632    1433 server.go:416] Version: v1.17.0
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.907654    1433 plugins.go:100] No cloud provider specified.
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.907806    1433 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.947107    1433 certificate_store.go:129] Loading cert/key pair from &quot;/var/lib/kubelet/pki/kubelet-client-curr
Jan 03 04:56:49 MILDEVKUB050 kubelet[1433]: I0103 04:56:49.263777    1433 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to
Jan 03 04:56:49 MILDEVKUB050 kubelet[1433]: F0103 04:56:49.264219    1433 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable sw
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result &#39;exit-code&#39;.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet&#39;s --conf
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet&#39;s --conf
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.712729    1500 server.go:416] Version: v1.17.0
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.714927    1500 plugins.go:100] No cloud provider specified.
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.715248    1500 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.763508    1500 certificate_store.go:129] Loading cert/key pair from &quot;/var/lib/kubelet/pki/kubelet-client-curr
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.956706    1500 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: F0103 04:56:59.957078    1500 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable sw
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result &#39;exit-code&#39;.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.

log file: service: Unit entered failed state

I tried by restarting the kubelet. But still there is no change in node state. Not ready state only.

Updates

When I am trying the command systemctl list-units --type=swap --state=active , then I am getting the following response,

docker@MILDEVKUB040:~$ systemctl list-units --type=swap --state=active
UNIT                                            LOAD   ACTIVE SUB    DESCRIPTION
dev-mapper-MILDEVDCR01\x2d\x2dvg\x2dswap_1.swap loaded active active /dev/mapper/MILDEVDCR01--vg-swap_1

Important

When I am getting these kind of issue with node not ready, each time I need to disable the swap and need to reload the daemon and kubelet. After that node becomes ready state. And again I need to repeat the same.

How can I find a permanent solution for this?

答案1

得分: 5

failed to run Kubelet: running with swap on is not supported, please disable swap

你需要在系统上禁用交换空间才能使kubelet正常工作。您可以使用 sudo swapoff -a 命令来禁用交换空间。

对于基于systemd的系统，还有另一种使用交换单元来启用交换分区的方法，这些单元在systemd重新加载时会启用，即使您已经使用 swapoff -a 命令关闭了交换空间。

https://www.freedesktop.org/software/systemd/man/systemd.swap.html

使用 systemctl list-units --type=swap --state=active 命令检查是否有任何活动的交换单元。

您可以使用 systemctl mask <unit name> 永久禁用任何活动的交换单元。

注意： 不要使用 systemctl disable <unit name> 命令来禁用交换单元，因为当systemd重新加载时，交换单元将重新激活。只使用 systemctl mask <unit name>。

为确保在系统由于电源故障或其他原因重新启动时交换空间不会重新启用，请移除或注释掉 /etc/fstab 中的交换条目。

总结：

运行 sudo swapoff -a
使用命令 systemctl list-units --type=swap --state=active 检查是否存在交换单元。如果有任何活动的交换单元，使用 systemctl mask <unit name> 命令屏蔽它们。
移除 /etc/fstab 中的交换条目。

英文:

failed to run Kubelet: running with swap on is not supported, please disable swap

You need to disable swap on the system for kubelet to work. You can disable swap with sudo swapoff -a

For systemd based systems, there is another way of enabling swap partitions using swap units which gets enabled whenever systemd reloads even if you have turned off swap using swapoff -a

https://www.freedesktop.org/software/systemd/man/systemd.swap.html

Check if you have any swap units using systemctl list-units --type=swap --state=active

You can permanently disable any active swap unit with systemctl mask <unit name>.

Note: Do not use systemctl disable <unit name> to disable the swap unit as swap unit will be activated again when systemd reloads. Use systemctl mask <unit name> only.

To make sure swap doesn't get re-enabled when your system reboots due to power cycle or any other reason, remove or comment out the swap entries in /etc/fstab

Summarizing:

Run sudo swapoff -a
Check if you have swap units with command systemctl list-units --type=swap --state=active. If there are any active swap units, mask them using systemctl mask <unit name>
Remove swap entries in /etc/fstab

答案2

得分: 1

根本原因是交换空间。要完全禁用，请按以下步骤进行操作：

运行 swapoff -a：这会立即禁用交换，但会在重新启动时激活。
从 /etc/fstab 中删除任何交换条目。
重新启动系统。

如果交换已经消失，那就很好。如果出现某种原因仍然存在，请
必须删除交换分区。重复步骤1和2，然后使用 fdisk 或 parted 删除（现在未使用的）交换分区。
在这里要非常小心：删除错误的分区会产生灾难性的
影响！

reboot

这应该解决您的问题。

英文:

The root cause is the swap space. To disable completely follow steps:

run swapoff -a: this will immediately disable swap but will activate on restart
remove any swap entry from /etc/fstab
reboot the system.

> If the swap is gone, good. If, for some reason, it is still here, you
> had to remove the swap partition. Repeat steps 1 and 2 and, after
> that, use fdisk or parted to remove the (now unused) swap partition.
> Use great care here: removing the wrong partition will have disastrous
> effects!

reboot

This should resolve your issue.

答案3

得分: 0

删除/etc/fstab会导致虚拟机出错，我认为我们应该找到另一种解决这个问题的方法。我尝试删除fstab，所有命令（安装、ping和其他命令）都出错。

英文:

Removing /etc/fstab will give the vm error, I think we should find another way to solve this issue. I tried to remove the fstab, all command (install, ping and other command) error.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

kubelet.service: 单元进入故障状态，来自Kubernetes集群的未就绪节点错误。

问题

答案1

答案2

答案3

AWS SDK在EKS中运行时缺少凭证，带有角色注释的ServiceAccount。

在K8s集群中的HA

K8s Operator通过事件过滤器监听密钥变化。

观察特定对象的事件

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。