Can an external process force the JVM to throw a "java.lang.OutOfMemoryError: GC overhead limit exceeded"

huangapple go评论76阅读模式
英文:

Can an external process force the JVM to throw a "java.lang.OutOfMemoryError: GC overhead limit exceeded"

问题

以下是您要求的翻译内容:

是否有可能由同一操作系统和硬件上运行的另一个进程(无论是Java还是其他进程)通过消耗内存和/或大量的CPU负载或其他某种方式来触发

java.lang.OutOfMemoryError: GC overhead limit exceeded

根据Java 8文档

> 详细消息“GC overhead limit exceeded”表示垃圾收集器始终在运行,Java程序的进展非常缓慢。垃圾收集后,如果Java进程花费的时间超过大约98%用于进行垃圾收集,并且如果它恢复的堆不到2%...

和这个稍旧的主题,我了解到这是与时间敏感有关的。然而,它似乎缺乏对这98%是什么含义的明确定义。

编辑20201008: 添加了垃圾收集器人机工程学链接

英文:

Is it possible for another process (java oder not) running on the same operating system and hardware to trigger a

java.lang.OutOfMemoryError: GC overhead limit exceeded

by either consuming RAM and/or an extensive CPU load - or by some other means?


From the Java 8 documentation

> The detail message "GC overhead limit exceeded" indicates that the garbage collector is running all the time and Java program is making very slow progress. After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap...

and this somewhat older thread I understand that this is time-sensitive. However, it seems to lack a proper specifications of what those 98% refer to.

Edit 20201008: Added Link to the Garbage Collector Ergonomics

答案1

得分: 3

是的,但在实际生活场景中,这种情况非常不太可能发生。

要使JVM抛出java.lang.OutOfMemoryError: GC overhead limit exceeded错误,必须满足两个条件:

  1. 垃圾回收(GC)循环回收的堆空间少于GCHeapFreeLimit(2%);
  2. JVM在GC过程中花费的时间超过GCTimeLimit(98%)。

外部进程很难影响第一个条件,除非它直接与目标应用程序交互。这意味着,JVM应该已经处于“几乎耗尽内存”的状态,错误才会发生。

另一个进程可能影响的是时间。如果此进程大量利用共享的CPU资源,它可能通过与JVM竞争CPU时间来使GC运行变慢。GC速度较慢意味着更长的GC周期,因此在GC中花费的时间比例更大。

我能够创建一个人工示例,其中另一个进程使JVM抛出GC overhead limit exceeded错误,但这确实很棘手。

考虑以下Java程序。

import java.util.ArrayList;

public class GCOverheadLimit {
    static ArrayList<Object> garbage = new ArrayList<>();
    static byte[] reserve = new byte[100_000];

    static void fillHeap() {
        try {
            while (true) {
                garbage.add(new byte[10_000]);
            }
        } catch (OutOfMemoryError e) {
            reserve = null;
        }
    }

    public static void main(String[] args) throws Exception {
        System.out.println("Filling heap");
        fillHeap();

        System.out.println("Starting GC loop");
        while (true) {
            garbage.add(new byte[10_000]);
            garbage.remove(garbage.size() - 1);
            Thread.sleep(20);
        }
    }
}

首先,它用不可回收的对象填充整个堆,留下少量的空闲内存。然后它重复分配可回收的垃圾以反复触发GC。每次迭代之间有小的延迟,以保持总GC开销低于98%。

该实验使用1GB的堆和并行GC:

java -Xmx1g -Xms1g -XX:+UseParallelGC GCOverheadLimit

我在一个CPU配额的cgroup中运行了此程序。我的计算机有4个核心,但我只让JVM在每100毫秒的周期内使用200毫秒的CPU时间。

mkdir /sys/fs/cgroup/cpu/test
echo 200000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us
echo $JAVA_PID > /sys/fs/cgroup/cpu/test/cgroup.procs

到目前为止,程序运行正常。现在我在相同的cgroup中运行一个或两个占用CPU的进程:

sha1sum /dev/zero &
echo $! > /sys/fs/cgroup/cpu/test/cgroup.procs

由于超出了配额,操作系统开始限制进程。GC时间增加,最终JVM抛出java.lang.OutOfMemoryError: GC overhead limit exceeded错误。

注意:要复现此问题需要仔细选择参数(堆大小,延迟,配额)。参数在其他计算机和环境中可能会有所不同。我的观点是,理论上可能出现这个问题,但在实践中很可能永远不会发生,因为需要太多需要同时匹配的因素。

英文:

Yes, but this is very unlikely in a real life scenario.

For the JVM to throw java.lang.OutOfMemoryError: GC overhead limit exceeded, two conditions must be met:

  1. A GC cycle reclaims less than GCHeapFreeLimit (2%) heap space;
  2. JVM spends more than GCTimeLimit (98%) time doing GC.

An external process can hardly affect the first condition, unless it directly interacts with the target application. This means, the JVM should already be in "almost out of memory" state for the error to happen.

What another process can probably affect is the timing. If this process heavily utilizes shared CPU resources, it can make GC run slower by competing with the JVM for the CPU time. Slower GC means longer GC cycles and thus more percentage of time spent in GC.

I was able to create an artificial example when another process makes JVM throw GC overhead limit exceeded, but this was really tricky.

Consider the following Java program.

import java.util.ArrayList;

public class GCOverheadLimit {
    static ArrayList&lt;Object&gt; garbage = new ArrayList&lt;&gt;();
    static byte[] reserve = new byte[100_000];

    static void fillHeap() {
        try {
            while (true) {
                garbage.add(new byte[10_000]);
            }
        } catch (OutOfMemoryError e) {
            reserve = null;
        }
    }

    public static void main(String[] args) throws Exception {
        System.out.println(&quot;Filling heap&quot;);
        fillHeap();

        System.out.println(&quot;Starting GC loop&quot;);
        while (true) {
            garbage.add(new byte[10_000]);
            garbage.remove(garbage.size() - 1);
            Thread.sleep(20);
        }
    }
}

First, it fills the entire heap with non-reclaimable objects, leaving a small reserve of free memory. Then in repeatedly allocates reclaimable garbage to make GC happen again and again. There is a small delay between iterations to keep total GC overhead less than 98%.

The experiment uses 1GB heap and the Parallel GC:

java -Xmx1g -Xms1g -XX:+UseParallelGC GCOverheadLimit

I run this program in a cgroup with CPU quota. My machine has 4 cores, but I let the JVM use only 200 ms CPU time each 100 ms period.

mkdir /sys/fs/cgroup/cpu/test
echo 200000 &gt; /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us
echo $JAVA_PID &gt; /sys/fs/cgroup/cpu/test/cgroup.procs

So far the program works fine. Now I run one or two CPU burning processes in the same cgroup:

sha1sum /dev/zero &amp;
echo $! &gt; /sys/fs/cgroup/cpu/test/cgroup.procs

Due to the exceeded quota, the OS starts to throttle processes. GC times increase, and the JVM finally throws java.lang.OutOfMemoryError: GC overhead limit exceeded.

Note: reproducing the problem required careful selection of parameters (heap size, delays, quota). The parameters will be different for other machines and other environments. My point is - the problem is theoretically possible, but will probably never happen in practice, since there are too many factors that need to match together.

huangapple
  • 本文由 发表于 2020年10月1日 19:31:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/64154506.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定