Create JVM heapdump when K8s healthcheck restarts the pod – no OOM occur

huangapple go评论90阅读模式
英文:

Create JVM heapdump when K8s healthcheck restarts the pod - no OOM occur

问题

我有一个情况,突然发生了一个非常长的GC暂停,我需要找出突然内存分配的来源。这个长时间的GC暂停(大约30秒)导致Pod连续失败了几个K8s健康检查,Pod被重新启动,实际上并没有发生OOM。我想在K8s实际重新启动Pod之前创建一个堆转储。我意识到堆转储应该保存到某个外部持久挂载上。

我唯一的想法是使用preStop钩子来触发堆转储。问题是,当Pod因健康检查失败而重新启动时,是否会触发preStop钩子?

也许有一个更加优雅的解决方案?

英文:

I have a situation when all of a sudden a really long GC pause occurs and I need to find out what is the source of the sudden memory allocation. The long GC pause (around 30 seconds) causes the pod to fail several K8s health checks in a row and the pod gets restarted, without OOM actually happening. I want to create a heap dump before the K8s actually restarts the pod. I realise that the dump should be done to some external persistent mount.

The only idea I have of how to cause the heap dump to occur is to use the preStop hook.
The question is, whether the preStop hook is fired when the pod is restarted because of health check failure?

Maybe there is a more elegant solution to this?

答案1

得分: 3

> The question is, whether the preStop hook is fired when the pod is restarted because of health check failure?

是的。根据定义PreStop 钩子在容器由于 API 请求或管理事件(例如存活探针失败、抢占、资源争用等)导致终止之前立即运行。

> Should I use preStop hook to capture Java Heap Dump before pod termination?

是的。但需要小心,如果容器已经处于终止或完成状态,调用 preStop 钩子会失败。当pod 终止时,它会等待默认的 30 秒宽限期(如果 PerStop 钩子未完成,则额外增加 2 秒),然后发送 KILL 信号。如果 preStop 钩子需要更长时间才能完成,您必须修改 terminationGracePeriodSeconds 以适应此情况。

> Any more elegant solution to this?

没有我知道的更加优雅的解决方案。我猜通过向 pod 添加一个空目录卷,并配置 JVM 将堆转储到该目录 command: ["java", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/dumps/oom.bin", "-jar", "yourapp.jar"] 应该可以工作。

> Why the above solution will work?

当 Kubernetes 杀死您的容器,因为它未响应健康检查时,Kubernetes 会重新启动容器,但不会重新调度 pod,因此不会将其移动到另一个节点。因此,直到 pod 被移到另一个节点之前,空目录卷不会被删除。因此,当容器重新启动时,新容器将挂载相同的空目录,其中包含先前运行的堆转储。因此,您可以在事件之后的任何时候使用 kubectl cp 复制这些文件。复制堆转储文件可能存在其他挑战,但它们是可以解决的。查看此处以获取更多信息。

英文:

> The question is, whether the preStop hook is fired when the pod is
> restarted because of health check failure?

Yes. As per the definition, PreStop hook runs immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others.


> Should I use preStop hook to capture Java Heap Dump before pod
> termination?

Yes. But you need to be careful, a call to the preStop hook fails if the container is already in terminated or completed state. When the pod is terminated, it waits for default 30 second grace period (with additional 2 seconds if PerStop hook is not completed) before sending KILL signal. If the preStop hook needs longer to complete than the default grace period allows, you must modify terminationGracePeriodSeconds to suit this.


> Any more elegant solution to this?

Not I am aware of. I guess by adding an empty dir volume to the pod, and configuring the JVM to do the heap dumps to that directory command: ["java", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/dumps/oom.bin", "-jar", "yourapp.jar"] should work.

> Why the above solution will work?

When kubernetes kills your container because it is not responding to the health check, the kubernetes will just restart the container, but it will not reschedule the pod, so it will not move it to another node. Hence the empty dir volume is not deleted until the pod is moved to another node. Hence when the container is restarted, the new container will mount the same empty dir, which will contain the heap dump from the previous run. So you can kubectl cp those files at any time after the event. There might be other challenges to copy the heap dump files but they are solvable. Check this for more info.

huangapple
  • 本文由 发表于 2020年8月2日 18:04:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/63214691.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定