2020年10月22日 01:40:59go评论100阅读模式

英文:

Why when processing JVMTI EVENT_EXCEPTION the JVM sometimes crashes or starts slowly?

问题

我决定使用JVMTI在JVM中注册所有发生的异常并构建直方图：
ClassName_method_ExceptionName:count

代理程序的附加方式如下：
taskset -c 10,12 java -agentpath:./jni-agent.so ${JVM_OPTION} -cp module.jar org.springframework.boot.loader.PropertiesLauncher &

我使用以下代码作为JNI代理程序：

#include <stdio.h>
#include <jvmti.h>
#include <jni.h>
#include <map>
#include <string>
#include <cstring>
#include "utils.h"
long excCount;
using namespace std;
map<string, long> mapExceptions;
map<string, long> mapContendedThreadStack;
void mapAddException(map<string, long> &inMap, string key){
    map<string, long>::iterator it = inMap.find(key);
    if(it == inMap.end()){
        inMap.insert(pair<string, long>(key,1));
        return;
    }
    inMap.find(key)->second ++;
    return;
}
..
..
void JNICALL ExceptionCallback(jvmtiEnv* jvmti, JNIEnv *env, jthread thread, jmethodID method, jlocation location,
jobject exception, jmethodID catch_method, jlocation catch_location) {
    excCount ++;
    char *class_name;
    jclass exception_class = env->GetObjectClass(exception);
    jvmti->GetClassSignature(exception_class, &class_name, NULL);
    char* method_name;
    jvmti->GetMethodName(method, &method_name, NULL, NULL);
    jclass holder;
    jvmti->GetMethodDeclaringClass(method, &holder);
    char* holder_name;
    jvmti->GetClassSignature(holder, &holder_name, NULL);
    if(strstr(holder_name, "java/lang") != NULL
        || strstr(holder_name, "java/net") != NULL
        || strstr(holder_name, "sun/reflect") != NULL
        || strstr(holder_name, "org/springframework/boot/loader/jar/JarURLConnection") != NULL
        || strstr(holder_name, "okhttp3/internal/connection/RealConnection") != NULL
        || strstr(holder_name, "okio/AsyncTimeout") != NULL ){
            jvmti->Deallocate((unsigned char*) method_name);
            jvmti->Deallocate((unsigned char*) holder_name);
            jvmti->Deallocate((unsigned char*) class_name);
            return;
    }
    string trimres = trimClassName(convertToString(holder_name, strlen(holder_name)));        
    char buftotal[1024];
    snprintf(buftotal, sizeof(buftotal) - 1, "%s#%s()_%s", trimres.c_str(), method_name, trimClassName(class_name).c_str());
    replacechar(buftotal, '/', '.');
    
    //mapAddException(mapExceptions, buftotal);       <-- 这里
    jvmti->Deallocate((unsigned char*) method_name);
    jvmti->Deallocate((unsigned char*) holder_name);
    jvmti->Deallocate((unsigned char*) class_name);
}
extern "C" JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved) {
    jvmtiCapabilities capabilities;
    jvmtiEventCallbacks callbacks;
    jvmtiEnv *jvmti;
    vm->GetEnv((void **) &jvmti, JVMTI_VERSION_1_0);
    printf("\nNative agent loaded by -agent.....\n");
    capabilities = {0};
    capabilities.can_generate_exception_events = 1;
    jvmti->AddCapabilities(&capabilities);
    callbacks = {0};
    callbacks.Exception = ExceptionCallback;
    jvmti->SetEventCallbacks(&callbacks, sizeof(callbacks));
    jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_EXCEPTION, NULL);
    return 0;
}

在有注释的地方 mapAddException(mapExceptions, buftotal); ，使用这个注释之后，JVM 的启动时间比没有使用这个JNI代理程序的时候要长得多。

我忘了说 - 这是一个正常工作的 Spring Boot 应用程序，里面只有 "Hello World"

但是当我取消注释 mapAddException(mapExceptions, buftotal); 的时候，JVM 有时会崩溃。并不总是崩溃，但有时会崩溃（大约50次中的2-4次）。

以下是从崩溃转储中摘录的部分：

..
#  SIGSEGV (0xb) at pc=0x00007f32781bf0b4, pid=47559, tid=0x00007f31e29e1700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_231-b11) (build 1.8.0_231-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.231-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libstdc++.so.6+0x750b4]
..
#
---------------  T H R E A D  ---------------
Current thread (0x00007f3275d82000):  JavaThread "tomcat-thread-16" daemon [_thread_in_native, id=47682, stack(0x00007f31e28e1000,0x00007f31e29e2000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000010
Registers:
RAX=0x0000000000000000, RBX=0x00007f3190009340, RCX=0x0000000000000001, RDX=0x00007f3274ba0eb0
RSP=0x00007f31e29dc7c8, RBP=0x00007f327414fea0, RSI=0x00007f32786541d0, RDI=0x00007f327414fea0
R8 =0x0000000000000002, R9 =0x0000000000000030, R10=0x000000000000000c, R11=0x00007f327a316f40
R12=0x00007f31a00504f0, R13=0x00007f32786541c8, R14=0x00007f32786541d0, R15=0x00007f3190009340
RIP=0x00007f32781bf0b4, EFLAGS=0x0000000000010283, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
<details>
<summary>英文:</summary>
I decided using JVMTI to register all exceptions occured in JVM and build Histogram: &lt;br/&gt;*ClassName_method_ExceptionName:count*
Agent attached like this:&lt;br/&gt; *taskset -c 10,12 java -agentpath:./jni-agent.so ${JVM_OPTION} -cp module.jar org.springframework.boot.loader.PropertiesLauncher &amp;*
I using this code for JNI agent:
enter code here
#include &lt;stdio.h&gt;
#include &lt;jvmti.h&gt;
#include &lt;jni.h&gt;
#include &lt;map&gt;
#include &lt;string&gt;
#include &lt;cstring&gt;
#include &quot;utils.h&quot;
long excCount;
using namespace std;
map&lt;string, long&gt; mapExceptions;
map&lt;string, long&gt; mapContendedThreadStack;
void mapAddException(map&lt;string,long&gt; &amp;inMap, string key){
map&lt;string, long&gt; :: iterator it = inMap.find(key);
if(it == inMap.end()){
inMap.insert(pair&lt;string, long&gt;(key,1));
return;
}
inMap.find(key)-&gt;second ++;
return;
}
..
..
void JNICALL ExceptionCallback(jvmtiEnv* jvmti, JNIEnv *env, jthread thread, jmethodID method,     jlocation location,
jobject exception, jmethodID catch_method, jlocation catch_location) {
excCount ++;
char *class_name;
jclass exception_class = env-&gt;GetObjectClass(exception);
jvmti-&gt;GetClassSignature(exception_class, &amp;class_name, NULL);
char* method_name;
jvmti-&gt;GetMethodName(method, &amp;method_name, NULL, NULL);
jclass holder;
jvmti-&gt;GetMethodDeclaringClass(method, &amp;holder);
char* holder_name;
jvmti-&gt;GetClassSignature(holder, &amp;holder_name, NULL);
if(strstr(holder_name, &quot;java/lang&quot;) != NULL
|| strstr(holder_name, &quot;java/net&quot;) != NULL
|| strstr(holder_name, &quot;sun/reflect&quot;) != NULL
|| strstr(holder_name, &quot;org/springframework/boot/loader/jar/JarURLConnection&quot;) != NULL
|| strstr(holder_name, &quot;okhttp3/internal/connection/RealConnection&quot;) != NULL
|| strstr(holder_name, &quot;okio/AsyncTimeout&quot;) != NULL ){
jvmti-&gt;Deallocate((unsigned char*) method_name);
jvmti-&gt;Deallocate((unsigned char*) holder_name);
jvmti-&gt;Deallocate((unsigned char*) class_name);
return;
}
string trimres = trimClassName(convertToString(holder_name, strlen(holder_name)));		
char buftotal[1024];
snprintf(buftotal, sizeof(buftotal) - 1, &quot;%s#%s()_%s&quot;, trimres.c_str(), method_name, trimClassName(class_name).c_str());
replacechar(buftotal, &#39;/&#39;, &#39;.&#39;);
//mapAddException(mapExceptions, buftotal);       &lt;-- HERE 
jvmti-&gt;Deallocate((unsigned char*) method_name);
jvmti-&gt;Deallocate((unsigned char*) holder_name);
jvmti-&gt;Deallocate((unsigned char*) class_name);
}
extern &quot;C&quot; JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved) {
jvmtiCapabilities capabilities;
jvmtiEventCallbacks callbacks;
jvmtiEnv *jvmti;
vm-&gt;GetEnv((void **) &amp;jvmti, JVMTI_VERSION_1_0);
printf(&quot;\nNative agent loaded by -agent.....\n&quot;);
capabilities = {0};
capabilities.can_generate_exception_events = 1;
jvmti-&gt;AddCapabilities(&amp;capabilities);
callbacks = {0};
callbacks.Exception = ExceptionCallback;
jvmti-&gt;SetEventCallbacks(&amp;callbacks, sizeof(callbacks));
jvmti-&gt;SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_EXCEPTION, NULL);
return 0;
}
With commented line mapAddException(...) the JVM started  much longer than without the JNI-agent.
Is this Ok ?&lt;br/&gt;
I forgot to say - this is Spring Boot application with &quot;Hello World&quot; all ok :)
But when I uncommented mapAddException(...) JVM sometimes crashed. Not at all, sometimes(~ 2-4 of 50).&lt;br/&gt;
This is cut from  CrashDump:
..
#  SIGSEGV (0xb) at pc=0x00007f32781bf0b4, pid=47559, tid=0x00007f31e29e1700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_231-b11) (build 1.8.0_231-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.231-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libstdc++.so.6+0x750b4]
..
#
---------------  T H R E A D  ---------------
Current thread (0x00007f3275d82000):  JavaThread &quot;tomcat-thread-16&quot; daemon [_thread_in_native, id=47682, stack(0x00007f31e28e1000,0x00007f31e29e2000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000010
Registers:
RAX=0x0000000000000000, RBX=0x00007f3190009340, RCX=0x0000000000000001, RDX=0x00007f3274ba0eb0
RSP=0x00007f31e29dc7c8, RBP=0x00007f327414fea0, RSI=0x00007f32786541d0, RDI=0x00007f327414fea0
R8 =0x0000000000000002, R9 =0x0000000000000030, R10=0x000000000000000c, R11=0x00007f327a316f40
R12=0x00007f31a00504f0, R13=0x00007f32786541c8, R14=0x00007f32786541d0, R15=0x00007f3190009340
RIP=0x00007f32781bf0b4, EFLAGS=0x0000000000010283, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
TRAPNO=0x000000000000000e
....
The Current thread are different every time at CrashDump.
</details>
# 答案1
**得分**: 3
代码部分不要翻译，只返回翻译好的内容：
**崩溃的原因很简单：你的代码不是线程安全的。**
在同一个抛出异常的线程上调用异常回调。如果多个线程同时抛出异常，回调也会同时被调用。`std::map` 不是线程安全的：并发访问可能会在 libstdc++ 内部导致崩溃。
**减慢速度的原因更有趣。**
如何分析与 JVM 和本地代码交互相关的性能问题？当然可以使用 [async-profiler](https://github.com/jvm-profiling-tools/async-profiler/)。在分析应用程序启动性能时，将分析器作为代理启动非常方便。

java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=out.svg -agentpath:./your-agent.so -cp ...


当我比较带有你的代理和不带代理的性能分析时，我注意到一个主要的区别。使用代理时，在左侧有一个明显的 `[deoptimization]` 块：
![deopt.png][1]
为什么会发生去优化？async-profiler 将再次提供帮助：让我们分析 `Deoptimization::deoptimize` 事件。这实际上不是一个事件，而是 HotSpot JVM 中的一个函数名（async-profiler 可以跟踪本地函数的调用）。

java -agentpath:/path/to/libasyncProfiler.so=start,event=Deoptimization::deoptimize,file=deopt.svg ...


有超过 8000 个去优化事件！以下是图表的一小部分：
![deopt2.png][2]
大多数去优化事件来自于 `exception_handler_for_pc_helper`。如果我们查看一下这个函数的[源代码](http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/73f624a2488d/src/share/vm/c1/c1_Runtime1.cpp#l493)，我们会看到一个有趣的注释：
```cpp
if (JvmtiExport::can_post_on_exceptions()) {
// 为了确保正确通知异常捕获和抛出，我们必须在这里进行去优化。
&lt;...&gt;
// 我们实际上并不想对 nmethod 本身进行去优化，因为我们实际上可以在异常处理程序中继续执行，但我没有看到实现所需效果的简单方法。
Deoptimization::deoptimize_frame(thread, caller_frame.id());

这意味着 - 当启用了 JVM TI 异常通知时，HotSpot 在编译代码中每次发生异常时都会强制进行去优化！如果异常频繁发生，这肯定会影响性能。

那么如何在没有 JVM TI 开销的情况下分析异常？

此时，您可能已经猜到答案 - 再次使用 async-profiler 除了 CPU 和本地函数分析外，它还可以截取任何 Java 方法调用。

要找到创建异常和错误的所有位置，我们可以截取 Throwable 构造函数。为此，将 java.lang.Throwable.<init> 指定为 async-profiler 事件。这次生成一个反转的调用树可能更方便（添加 reversed,file=tree.html 选项），以便按类型和构造位置对可抛出项进行分组。

java -agentpath:/path/to/libasyncProfiler.so=start,event=java.lang.Throwable.\&lt;init\&gt;,reverse,file=tree.html ...

这将生成一个交互式的 HTML 树，在树中您将找到所有带有计数器和完整堆栈跟踪的异常：

为什么在处理JVMTI EVENT_EXCEPTION时，JVM有时会崩溃或启动缓慢？

哎呀！一个简单的 Spring Boot 应用程序抛出了 4700 个异常：其中 90% 是 ClassNotFoundException，7% 是 NoSuchMethodException。

重要的是，async-profiler 对方法跟踪使用字节码插装，因此不会受到 JVM TI 异常回调的性能问题的影响。

额外奖励

在首次对您的代理进行性能分析时，除了去优化问题外，我还发现了相当数量的堆栈跟踪，其中 CPU 时间花费在调用 ExceptionCallback 的 GetMethodDeclaringClass 和 GetMethodName：

为什么在处理JVMTI EVENT_EXCEPTION时，JVM有时会崩溃或启动缓慢？

这实际上是 JDK 8 中的一个性能 bug，我在 3 年前报告了这个问题：JDK-8185348。最终，在 JDK 8u281 中修复了这个问题。但是，在 JDK 9+ 中并没有这样的问题。

简而言之，由于所有 JVM TI Method 函数都很慢，所以很有必要缓存这些函数的结果，以便代理不会为同一个方法多次调用 JVM TI 函数。

英文:

The crash reason is simple: your code is not thread safe.

Exception callback is invoked on the same thread that throws an exception. If multiple threads throw an exception concurrently, the callback is also invoked concurrently. std::map is not thread safe: concurrent access may cause a crash inside libstdc++.

The reason of slowness is more interesting.

How to analyze performance issues related to interaction between JVM and native? With async-profiler, of course. When profiling application startup, it's convenient to start profiler as an agent.

java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=out.svg -agentpath:./your-agent.so -cp ...

When I compared profiles with and without your agent, I noticed one major difference. With the agent, there was a noticeable [deoptimization] block on the left side:

Why does deoptimization happen? async-profiler will help again: let's profile Deoptimization::deoptimize event now. It's not really an event, this is just a function name in HotSpot JVM (async-profiler can trace invocations of native functions).

java -agentpath:/path/to/libasyncProfiler.so=start,event=Deoptimization::deoptimize,file=deopt.svg ...

There was more than 8000 deoptimization events! Here is just a small part of the graph:

Most deoptimizations originate from exception_handler_for_pc_helper. If we look at the source code of this function, we'll see an interesting comment:

  if (JvmtiExport::can_post_on_exceptions()) {
// To ensure correct notification of exception catches and throws
// we have to deoptimize here.
&lt;...&gt;
// We don&#39;t really want to deoptimize the nmethod itself since we
// can actually continue in the exception handler ourselves but I
// don&#39;t see an easy way to have the desired effect.
Deoptimization::deoptimize_frame(thread, caller_frame.id());

What this means is - when JVM TI Exception notifications are on, HotSpot forces deoptimization every time an exception happens in the compiled code! This can surely hit performance if exceptions are frequent.

How to analyze exceptions without JVM TI overhead then?

At this point you may already guess the answer - with async-profiler again Besides CPU and native function profiling, it can also intercept any Java method call.

To find all places where exceptions and errors are created, we can just intercept Throwable constructors. To do so, specify java.lang.Throwable.<init> as async-profiler event. This time it's handy to produce a reversed call tree (add reversed,file=tree.html options), so that throwables will be grouped by the type and the construction site.

java -agentpath:/path/to/libasyncProfiler.so=start,event=java.lang.Throwable.\&lt;init\&gt;,reverse,file=tree.html ...

This will produce an interactive HTML tree, where you'll find all exceptions with counters and full stack traces:

Ouch! A simple Spring Boot application throws 4700 exceptions: most of them are ClassNotFoundException (90%) and NoSuchMethodException (7%).

What is important, async-profiler uses bytecode instrumentation for method tracing, so it does not suffer from the performance problem of JVM TI exception callback.

Bonus

When profiling your agent for the first time, besides the deoptimization issue, I also found a decent number of stack traces where CPU time is spent in your ExceptionCallback calling GetMethodDeclaringClass and GetMethodName:

This is actually a performance bug in JDK 8, which I reported 3 years ago: JDK-8185348. At last, it has been fixed in JDK 8u281. However, there was no such problem in JDK 9+.

In short, since all JVM TI Method functions are slow, it is a good idea to cache the result of these functions, so that the agent will not call JVM TI function for the same method more than once.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在处理JVMTI EVENT_EXCEPTION时，JVM有时会崩溃或启动缓慢？

问题

Java双重循环和条件使用stdDraw绘制图片，找不到错误。

jOOQ与Postgres的PERCENTILE_CONT和MEDIAN函数类型转换问题

在Java中，while循环的时间复杂度是：

Maven在JUnit5中不运行@ParameterizedTest。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论