Java(在JDK8更新131之前)应用在Docker容器中运行时出现的CPU / 内存问题?

huangapple go评论71阅读模式
英文:

Java (prior to JDK8 update 131) applications running in docker container CPU / Memory issues?

问题

JVM 在 Docker 容器中运行时(JDK 8 在更新 131 之前的版本),会忽略容器环境设置的 CGroup 限制。
同时,它们会查询主机资源,而不是分配给容器的资源。
这对于 JVM 来说是灾难性的,因为 JVM 试图分配比 CGroup 限制允许的资源(CPU 或内存)更多的资源,Docker 守护进程会注意到这一点,并在 Java 程序以 pid 1 运行时终止 JVM 进程或容器本身。

解决内存问题的方法 - (可能已在 JDK 8 更新 131 中修复)
如上所述,JVM 为自身分配的内存超过了容器允许的限制。可以通过以下方法轻松解决:

  1. 在启动 JVM 时明确设置最大堆内存限制(使用 -Xmx )(在 131 更新之前)
  2. 或者在 131 更新之后通过传递以下标志解决:
    -XX:+UnlockExperimentalVMOptions
    -XX:+UseCGroupMemoryLimitForHeap

解决 CPU 问题(可能在 JDK 更新 212 中修复)
再次如上所述,运行在 Docker 中的 JVM 会直接查看主机硬件并获取总共可用的 CPU。然后它会尝试基于这些 CPU 数量进行访问或优化。

  1. 在 JDK 8 更新 212 之后,任何在 Docker 容器中运行的 JVM 都会尊重分配给容器的 CPU 限制,而不会直接查看主机的 CPU。
    如果以以下方式启动具有 CPU 限制的容器,则 JVM 将尊重此限制并将自身限制为 1 个 CPU。
    docker run -ti --cpus 1 -m 1G openjdk:8u212-jdk // 在此容器中运行的 JVM 受限于 1 个 CPU。
  2. 这里是我的问题: CPU 问题可能在 JDK 8 更新 212 中修复,但如果我无法更新我的 JVM,并且我运行的是 131 更新之前的版本,我该如何解决 CPU 问题。
英文:

JVM's (JDK 8 before Update 131) running in docker containers were ignoring the CGroup limitations set by the container environment.
And, they were querying for host resources and not what was allocated to the container.
The result is catastrophic for the JVM i.e As the JVM was trying to allocate itself more resources (CPU or Memory) than what is permitted through CGroup limits, docker demon would notice this and kill the JVM process or the container itself if the java program was running with pid 1.

Solution for memory issue - (possibly fixed in JDK 8 update 131)
Like described above, JVM was allocating it's self more memory than what's allowed for the container. This could be easily fixed by

  1. explicitly setting the max heap memory limit (using -Xmx ) while starting the JVM. ( prior to 131 update)
  2. or by passing these flags - (after 131 update)
    -XX:+UnlockExperimentalVMOptions and
    -XX:+UseCGroupMemoryLimitForHeap

Resolving the CPU issue (possibly fixed in JDK update 212 )
Again like described above, JVM running in docker would look at the host hardware directly and obtain the total CPUs available. Then it would try to access or optimize based on this CPU counts.

  1. After JDK 8 update 212, any JVM running in docker container will respect the cpu limits allocated to container and not look into host cpus directly.
    If a container with cpu limitation is started as below, JVM will respect this limitation and restrict itself to 1 cpu.
    docker run -ti --cpus 1 -m 1G openjdk:8u212-jdk //jvms running in this container are restricted to 1cpu.
  2. HERE IS MY QUESTION: The CPU issue is probabily fixed in JDK8 Update 212, but what if I can not update my JVM and I am running version prior to update 131 , how can I fix the cpu issue.

答案1

得分: 1

Linux容器支持首次出现在JDK 10中,然后移植到了8u191,详见JDK-8146115

早期版本的JVM获取可用CPU数量的方式如下。

  • 在8u121之前,HotSpot JVM依赖于sysconf(_SC_NPROCESSORS_ONLN) libc调用。而glibc会读取系统文件/sys/devices/system/cpu/online。因此,为了伪造可用CPU的数量,可以使用bind mount替换此文件:

    echo 0-3 > /tmp/online
    docker run --cpus 4 -v /tmp/online:/sys/devices/system/cpu/online ...
    

    要设置仅一个CPU,请使用echo 0替代echo 0-3

  • 自从8u121以来,JVM开始支持taskset aware。它不再使用sysconf,而是开始调用sched_getaffinity来查找进程的CPU亲和性掩码。

    这破坏了bind mount的技巧。不幸的是,无法像对待sysconf那样伪造sched_getaffinity。然而,可以使用LD_PRELOAD来替换sched_getaffinity的libc实现。

我编写了一个小的共享库proccount,可以替换sysconfsched_getaffinity。因此,这个库可以在8u191之前的所有JDK版本中用于设置正确的可用CPU数量。

工作原理

  1. 首先,它读取cpu.cfs_quota_uscpu.cfs_period_us,以查找容器是否以--cpus选项启动。如果两者都大于零,则估算CPU数量为

    cpu.cfs_quota_us / cpu.cfs_period_us
    
  2. 否则,它会读取cpu.shares并估算可用CPU数量为

    cpu.shares / 1024
    

    这种CPU计算方式类似于现代容器感知的JDK实际工作方式。

  3. 该库定义(覆盖)sysconfsched_getaffinity函数,以返回(1)或(2)中获取的处理器数量。

编译方法

gcc -O2 -fPIC -shared -olibproccount.so proccount.c -ldl

使用方法

LD_PRELOAD=/path/to/libproccount.so java <args>
英文:

Linux container support first appeared in JDK 10 and then ported to 8u191, see JDK-8146115.

Earlier versions of the JVM obtained the number of available CPUs as following.

  • Prior to 8u121, HotSpot JVM relied on sysconf(_SC_NPROCESSORS_ONLN) libc call. In turn, glibc read the system file /sys/devices/system/cpu/online. Therefore, in order to fake the number of available CPUs, one could replace this file using a bind mount:

    echo 0-3 &gt; /tmp/online
    docker run --cpus 4 -v /tmp/online:/sys/devices/system/cpu/online ...
    

    To set only one CPU, write echo 0 instead of echo 0-3

  • Since 8u121 the JVM became taskset aware. Instead of sysconf, it started calling sched_getaffinity to find the CPU affinity mask for the process.

    This broke bind mount trick. Unfortunately, you can't fake sched_getaffinity the same way as sysconf. However, it is possible to replace libc implementation of sched_getaffinity using LD_PRELOAD.

I wrote a small shared library proccount that replaces both sysconf and sched_getaffinity. So, this library can be used to set the right number of available CPUs in all JDK versions before 8u191.

How it works

  1. First, it reads cpu.cfs_quota_us and cpu.cfs_period_us to find if the container is launched with --cpus option. If both are above zero, the number of CPUs is estimated as

    cpu.cfs_quota_us / cpu.cfs_period_us
    
  2. Otherwise it reads cpu.shares and estimates the number of available CPUs as

    cpu.shares / 1024
    

    Such CPU calculation is similar to how it actually works in a modern container-aware JDK.

  3. The library defines (overrides) sysconf and sched_getaffinity functions to return the number of processors obtained in (1) or (2).

How to compile

gcc -O2 -fPIC -shared -olibproccount.so proccount.c -ldl

How to use

LD_PRELOAD=/path/to/libproccount.so java &lt;args&gt;

huangapple
  • 本文由 发表于 2020年10月8日 20:49:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/64262912.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定