BUG: 无法处理内核空指针在 (null) 处的解引用

huangapple go评论60阅读模式
英文:

BUG: unable to handle kernel NULL pointer dereference at (null)

问题

我在Linux服务器上运行Java(TIBCO EBX),内存为192 GB RAM。我们不断看到Java进程重新启动,应用程序会进入挂起状态,并出现高内存警报。我们已将堆大小设置为176 GB,但我们发现堆大小在10小时间隔后已满,内存利用率从未下降。如果我们重新启动服务器,内存利用率会降低。我们尝试为服务器获取Kdump以分析内存泄漏,在vmcore-dmesg.txt中,我们看到以下条目。有人能建议这是否导致我们的内存泄漏以及如何解决这个问题吗?

[  389.832835] SysRq:触发崩溃
[  390.049124] BUG:无法处理内核空指针解引用 (null)
[  390.050076] IP: [<ffffffffbb270326>] sysrq_handle_crash+0x16/0x20
[  390.050076] PGD 80000017c6c6e067 PUD 17fa9c8067 PMD 0

我们的内核版本如下:

$uname -r
3.10.0-1062.52.2.el7.x86_64

$ uname -a
Linux sr001 3.10.0-1062.52.2.el7.x86_64 #1 SMP Thu Jul 8 09:03:01 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

任何其他建议/建议以供参考。

英文:

I am running a Java (TIBCO EBX) in linux server with 192 GB RAM and we are constantly seeing Java process getting restarted and application will go into hang state, with High Memory alerts. We had set the heap size as 176 GB and we are seeing Heap size is getting full after 10 hours of interval with memory utilization never going down. If we restart the server, memory utilization will go down. We tried to get the Kdump for the server to analyze memory leak and in vmcore-dmesg.txt, we are seeing below entries. Can anyone suggest if this is causing our memory leak and how we can fix this issue ?

[  389.832835] SysRq : Trigger a crash
[  390.049124] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  390.050076] IP: [<ffffffffbb270326>] sysrq_handle_crash+0x16/0x20
[  390.050076] PGD 80000017c6c6e067 PUD 17fa9c8067 PMD 0

Our Kernel version is below :-

$uname -r 
3.10.0-1062.52.2.el7.x86_64

$ uname -a
Linux sr001 3.10.0-1062.52.2.el7.x86_64 #1 SMP Thu Jul 8 09:03:01 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Any other advice/suggestion to look.

答案1

得分: 3

内核dmesg消息是故意通过SysRq机制触发崩溃的结果(可以使用魔术SysRq键组合Alt-SysRq后跟c,或者将c写入"/proc/sysrq-trigger"文件来触发)。 (这些需要在内核中启用SysRq机制。)它们与内存泄漏无关。

当SysRq机制触发内核崩溃时,在"drivers/tty/sysrq.c"中的sysrq_handle_crash()函数执行(自内核2.6.37以来)。对于3.10内核,该函数定义如下:

static void sysrq_handle_crash(int key)
{
	char *killer = NULL;

	panic_on_oops = 1;	/* 强制崩溃 */
	wmb();
	*killer = 1;
}

那个*killer = 1;行触发了导致内核崩溃的错误。但实际上这不是一个错误,因为它是故意这样做的。

毫无疑问,BUG:消息可能会引起一些人的恐慌。在内核5.0中,该函数已更改以避免故意的空指针解引用,而直接调用panic(),如下所示:

static void sysrq_handle_crash(int key)
{
	/* 在崩溃之前释放RCU读锁 */
	rcu_read_unlock();

	panic("sysrq触发崩溃\n");
}

rcu_read_unlock()调用只是因为在调用此函数之前调用了rcu_read_lock()。)


我怀疑内存泄漏可能出现在应用程序而不是内核中。一种判断的方法是终止应用程序。如果内存泄漏出现在应用程序中,当终止其进程时,内存使用量应该恢复正常。如果内存泄漏出现在内核中,内存将一直不可用,直到重新启动内核。

英文:

The kernel dmesg messages are a result of deliberately triggering a crash through the SysRq mechanism (either using the magic SysRq key combination Alt-SysRq followed by c, or by writing c to the "/proc/sysrq-trigger" file). (Those require the SysRq mechanism to be enabled in the kernel.) They are unrelated to the memory leak.

When the kernel crash is triggered by the SysRq mechanism, the sysrq_handle_crash() function in "drivers/tty/sysrq.c" (since kernel 2.6.37). For the 3.10 kernel, it the function definition is as follows:

static void sysrq_handle_crash(int key)
{
	char *killer = NULL;

	panic_on_oops = 1;	/* force panic */
	wmb();
	*killer = 1;
}

That *killer = 1; line triggers the bug that results in a kernel panic. But it is not really a bug because it was deliberately done that way.

No doubt the BUG: message will have alarmed some people. In kernel 5.0, the function was changed to avoid the deliberate null pointer dereference and just call panic() directly, as follows:

static void sysrq_handle_crash(int key)
{
	/* release the RCU read lock before crashing */
	rcu_read_unlock();

	panic("sysrq triggered crash\n");
}

(The rcu_read_unlock() call is only there because rcu_read_lock() is called before this function is called.)


I suspect the memory leak is probably in the application rather than in the kernel. One way to tell is by killing the application. If the memory leak is in the application, the memory usage should return to normal when its processes are killed. If the memory leak is in the kernel, the memory remains unavailable for further use until the kernel is rebooted.

huangapple
  • 本文由 发表于 2023年6月22日 15:25:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76529479.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定