英文:
Java (prior to JDK8 update 131) applications running in docker container CPU / Memory issues?
问题
JVM 在 Docker 容器中运行时(JDK 8 在更新 131 之前的版本),会忽略容器环境设置的 CGroup 限制。
同时,它们会查询主机资源,而不是分配给容器的资源。
这对于 JVM 来说是灾难性的,因为 JVM 试图分配比 CGroup 限制允许的资源(CPU 或内存)更多的资源,Docker 守护进程会注意到这一点,并在 Java 程序以 pid 1 运行时终止 JVM 进程或容器本身。
解决内存问题的方法 - (可能已在 JDK 8 更新 131 中修复)
如上所述,JVM 为自身分配的内存超过了容器允许的限制。可以通过以下方法轻松解决:
- 在启动 JVM 时明确设置最大堆内存限制(使用
-Xmx
)(在 131 更新之前) - 或者在 131 更新之后通过传递以下标志解决:
-XX:+UnlockExperimentalVMOptions
和
-XX:+UseCGroupMemoryLimitForHeap
解决 CPU 问题(可能在 JDK 更新 212 中修复)
再次如上所述,运行在 Docker 中的 JVM 会直接查看主机硬件并获取总共可用的 CPU。然后它会尝试基于这些 CPU 数量进行访问或优化。
- 在 JDK 8 更新 212 之后,任何在 Docker 容器中运行的 JVM 都会尊重分配给容器的 CPU 限制,而不会直接查看主机的 CPU。
如果以以下方式启动具有 CPU 限制的容器,则 JVM 将尊重此限制并将自身限制为 1 个 CPU。
docker run -ti --cpus 1 -m 1G openjdk:8u212-jdk
// 在此容器中运行的 JVM 受限于 1 个 CPU。 - 这里是我的问题: CPU 问题可能在 JDK 8 更新 212 中修复,但如果我无法更新我的 JVM,并且我运行的是 131 更新之前的版本,我该如何解决 CPU 问题。
英文:
JVM's (JDK 8 before Update 131) running in docker containers were ignoring the CGroup limitations set by the container environment.
And, they were querying for host resources and not what was allocated to the container.
The result is catastrophic for the JVM i.e As the JVM was trying to allocate itself more resources (CPU or Memory) than what is permitted through CGroup limits, docker demon would notice this and kill the JVM process or the container itself if the java program was running with pid 1.
Solution for memory issue - (possibly fixed in JDK 8 update 131)
Like described above, JVM was allocating it's self more memory than what's allowed for the container. This could be easily fixed by
- explicitly setting the max heap memory limit (using
-Xmx
) while starting the JVM. ( prior to 131 update) - or by passing these flags - (after 131 update)
-XX:+UnlockExperimentalVMOptions
and
-XX:+UseCGroupMemoryLimitForHeap
Resolving the CPU issue (possibly fixed in JDK update 212 )
Again like described above, JVM running in docker would look at the host hardware directly and obtain the total CPUs available. Then it would try to access or optimize based on this CPU counts.
- After JDK 8 update 212, any JVM running in docker container will respect the cpu limits allocated to container and not look into host cpus directly.
If a container with cpu limitation is started as below, JVM will respect this limitation and restrict itself to 1 cpu.
docker run -ti --cpus 1 -m 1G openjdk:8u212-jdk
//jvms running in this container are restricted to 1cpu. - HERE IS MY QUESTION: The CPU issue is probabily fixed in JDK8 Update 212, but what if I can not update my JVM and I am running version prior to update 131 , how can I fix the cpu issue.
答案1
得分: 1
Linux容器支持首次出现在JDK 10中,然后移植到了8u191,详见JDK-8146115。
早期版本的JVM获取可用CPU数量的方式如下。
-
在8u121之前,HotSpot JVM依赖于
sysconf(_SC_NPROCESSORS_ONLN)
libc调用。而glibc会读取系统文件/sys/devices/system/cpu/online
。因此,为了伪造可用CPU的数量,可以使用bind mount替换此文件:echo 0-3 > /tmp/online docker run --cpus 4 -v /tmp/online:/sys/devices/system/cpu/online ...
要设置仅一个CPU,请使用
echo 0
替代echo 0-3
-
自从8u121以来,JVM开始支持taskset aware。它不再使用
sysconf
,而是开始调用sched_getaffinity
来查找进程的CPU亲和性掩码。这破坏了bind mount的技巧。不幸的是,无法像对待
sysconf
那样伪造sched_getaffinity
。然而,可以使用LD_PRELOAD来替换sched_getaffinity
的libc实现。
我编写了一个小的共享库proccount,可以替换sysconf
和sched_getaffinity
。因此,这个库可以在8u191之前的所有JDK版本中用于设置正确的可用CPU数量。
工作原理
-
首先,它读取
cpu.cfs_quota_us
和cpu.cfs_period_us
,以查找容器是否以--cpus
选项启动。如果两者都大于零,则估算CPU数量为cpu.cfs_quota_us / cpu.cfs_period_us
-
否则,它会读取
cpu.shares
并估算可用CPU数量为cpu.shares / 1024
这种CPU计算方式类似于现代容器感知的JDK实际工作方式。
-
该库定义(覆盖)
sysconf
和sched_getaffinity
函数,以返回(1)或(2)中获取的处理器数量。
编译方法
gcc -O2 -fPIC -shared -olibproccount.so proccount.c -ldl
使用方法
LD_PRELOAD=/path/to/libproccount.so java <args>
英文:
Linux container support first appeared in JDK 10 and then ported to 8u191, see JDK-8146115.
Earlier versions of the JVM obtained the number of available CPUs as following.
-
Prior to 8u121, HotSpot JVM relied on
sysconf(_SC_NPROCESSORS_ONLN)
libc call. In turn, glibc read the system file/sys/devices/system/cpu/online
. Therefore, in order to fake the number of available CPUs, one could replace this file using a bind mount:echo 0-3 > /tmp/online docker run --cpus 4 -v /tmp/online:/sys/devices/system/cpu/online ...
To set only one CPU, write
echo 0
instead ofecho 0-3
-
Since 8u121 the JVM became taskset aware. Instead of
sysconf
, it started callingsched_getaffinity
to find the CPU affinity mask for the process.This broke bind mount trick. Unfortunately, you can't fake
sched_getaffinity
the same way assysconf
. However, it is possible to replace libc implementation ofsched_getaffinity
using LD_PRELOAD.
I wrote a small shared library proccount that replaces both sysconf
and sched_getaffinity
. So, this library can be used to set the right number of available CPUs in all JDK versions before 8u191.
How it works
-
First, it reads
cpu.cfs_quota_us
andcpu.cfs_period_us
to find if the container is launched with--cpus
option. If both are above zero, the number of CPUs is estimated ascpu.cfs_quota_us / cpu.cfs_period_us
-
Otherwise it reads
cpu.shares
and estimates the number of available CPUs ascpu.shares / 1024
Such CPU calculation is similar to how it actually works in a modern container-aware JDK.
-
The library defines (overrides)
sysconf
andsched_getaffinity
functions to return the number of processors obtained in (1) or (2).
How to compile
gcc -O2 -fPIC -shared -olibproccount.so proccount.c -ldl
How to use
LD_PRELOAD=/path/to/libproccount.so java <args>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论