2020年10月17日 00:16:12go评论108阅读模式

英文:

When busy-spining Java thread is bound to physical core, can context switch happen by the reason that new branch in code is reached?

问题

我对低延迟代码很感兴趣，这就是为什么我尝试配置线程亲和性。特别地，这应该有助于避免上下文切换。

我使用 https://github.com/OpenHFT/Java-Thread-Affinity 进行了线程亲和性的配置。我运行了一个非常简单的测试代码，只是在一个循环中旋转，检查一个时间条件。

long now = start;
while (true)
{
    if (now < start + TimeUtils.NANOS_IN_SECOND * delay)
    {
        now = TimeUtils.now();
    }
    else
    {
        // 将在30秒后打印
        if (TimeUtils.now() > start + TimeUtils.NANOS_IN_SECOND * (delay + 30))
        {
            final long finalNow = now;
            System.out.println("时间已过，时间为：" +
                    TimeUtils.toInstant(finalNow) + " 现在：" +
                    TimeUtils.toInstant(TimeUtils.now()));
            System.exit(0);
        }
    }
}

所以，在指定的延迟之后，执行进入“else”分支，大约在同一时间我看到上下文切换。这是预期的行为吗？这是什么具体原因？在这种情况下如何避免上下文切换？

测试详情

我从这个仓库构建了 shadowJar：https://github.com/stepan2271/thread-affinity-example。然后我使用以下命令运行它（可以在这里尝试不同的数字，当延迟大于60时，对测试影响不大）：

taskset -c 19 java -DtestLoopBindingCpu=3 -Ddelay=74 -cp demo-all.jar main.TestLoop

我还有以下的测试脚本来监控上下文切换（应该使用绑定到核心的 Java 线程的 ID 运行）：

#!/bin/bash
while [ true ]
do
date >> ~/demo-ctxt-switches.log
cat /proc/$1/status | grep ctxt >> ~/demo-ctxt-switches.log
sleep 3
done

此脚本的典型输出如下：

Fri Oct 16 18:23:29 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:32 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:35 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:38 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:41 MSK 2020
voluntary_ctxt_switches:	91
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:44 MSK 2020
voluntary_ctxt_switches:	91
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:47 MSK 2020
voluntary_ctxt_switches:	91
nonvoluntary_ctxt_switches:	37

所以，在一些开始时间的变化之后，这些数字变得稳定，然后在代码到达“else”分支时，我会在相同时间内看到1到3次切换（差异小于1秒）。

偏差

基本配置几乎每次都会重现这种行为，而一些偏差会导致我无法重现。示例：

https://github.com/stepan2271/thread-affinity-example/tree/without-log4j

https://github.com/stepan2271/thread-affinity-example/tree/without-cached-nano-clock

测试环境

2 * Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz

Red Hat Enterprise Linux 8.1 (Ootpa)

核心使用 /etc/systemd/system.conf 和 /etc/systemd/user.conf 中的 CPUAffinity 进行了隔离。

已配置 /etc/sysconfig/irqbalance。

Openjdk 11.0.6 2020-01-14 LTS 运行时环境 18.9

英文:

I am interested in low-latency code and that`s why I tried to configure thread affinity. In particular, it was supposed to help to avoid context switches.

I have configured thread affinity using https://github.com/OpenHFT/Java-Thread-Affinity. I run very simple test code that just spins in a cycle checking a time condition.

    long now = start;
    while (true)
    {
        if (now &lt; start + TimeUtils.NANOS_IN_SECOND * delay)
        {
            now = TimeUtils.now();
        }
        else
        {
            // Will be printed after 30 sec
            if (TimeUtils.now() &gt; start + TimeUtils.NANOS_IN_SECOND * (delay + 30))
            {
                final long finalNow = now;
                System.out.println(&quot;Time is over at &quot; +
                        TimeUtils.toInstant(finalNow) + &quot; now: &quot; +
                        TimeUtils.toInstant(TimeUtils.now()));
                System.exit(0);
            }
        }
    }

So, after the specified delay execution goes to "else" and approximately at the same time I see context switch. Is it an expected behavior? What is the specific reason for this? How can I avoid context switches in such cases?

Test Details

I build shadowJar from this repo: https://github.com/stepan2271/thread-affinity-example. Then I run it using the following command (one can play around with numbers here, it doesn`t have significant effect on test when delay > 60):

taskset -c 19 java -DtestLoopBindingCpu=3 -Ddelay=74 -cp demo-all.jar main.TestLoop

I also have the following test script to monitor context switches (should be run with ID of the Java thread that is bound to core)

#!/bin/bash
while [ true ]
do
date &gt;&gt; ~/demo-ctxt-switches.log
cat /proc/$1/status | grep ctxt &gt;&gt; ~/demo-ctxt-switches.log
sleep 3
done

Typical output of this script is the following:

Fri Oct 16 18:23:29 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:32 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:35 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:38 MSK 2020
voluntary_ctxt_switches:	90
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:41 MSK 2020
voluntary_ctxt_switches:	91
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:44 MSK 2020
voluntary_ctxt_switches:	91
nonvoluntary_ctxt_switches:	37
Fri Oct 16 18:23:47 MSK 2020
voluntary_ctxt_switches:	91
nonvoluntary_ctxt_switches:	37

So, after some changes in start time these numbers become stable and then I see from 1 to 3 switches exactly (difference is less than 1 sec) at the time when code reaches "else" branch.

Deviations

Base configuration reproduces this behavior almost each time, while some deviations lead to situation when I didn`t manage to reproduce. Examples:

https://github.com/stepan2271/thread-affinity-example/tree/without-log4j

https://github.com/stepan2271/thread-affinity-example/tree/without-cached-nano-clock

Test environment

2 * Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz

Red Hat Enterprise Linux 8.1 (Ootpa)

Cores are isolated using CPUAffinity in /etc/systemd/system.conf and /etc/systemd/user.conf

/etc/sysconfig/irqbalance is configured.

Openjdk 11.0.6 2020-01-14 LTS Runtime Environment 18.9

答案1

得分: 28

自愿的上下文切换通常意味着线程正在等待某些事情，例如等待锁变为空闲。

async-profiler 可以帮助查找上下文切换发生的位置。以下是我使用的命令行：

./profiler.sh -d 80 -e context-switches -i 2 -t -f switches.svg -I 'main*' -X 'exit_to_usermode_loop*' PID

让我们详细介绍一下：

-d 80 最多运行 80 秒的分析器。
-e context-switches 要进行分析的事件。
-i 2 间隔 = 2 个事件。我每秒分析一次上下文切换，因为分析信号本身会导致上下文切换，我不想陷入递归。
-t 按线程拆分分析结果。
-f switches.svg 输出文件名；svg 扩展名会自动选择 Flame Graph 格式。
-I 'main*' 仅包括主线程在输出中。
-X 'exit_to_usermode_loop*' 排除与非自愿上下文切换相关的事件。
PID 要分析的 Java 进程 ID。

结果可能因每次运行而异。通常在每个图表上我看到 0 到 3 次上下文切换。

以下是上下文切换最常发生的地方。它们确实与在互斥锁上等待相关：

从 TestLoop.main 调用 ThreadSafepointState::handle_polling_page_exception()。这意味着一个线程已经在另一个线程请求的安全点处停止。要调查安全点的原因，请添加 -Xlog:safepoint* JVM 选项。

[75.889s][info][safepoint        ] Application time: 74.0071000 seconds
[75.889s][info][safepoint        ] Entering safepoint region: Cleanup
[75.889s][info][safepoint,cleanup] deflating idle monitors, 0.0000003 secs
[75.889s][info][safepoint,cleanup] updating inline caches, 0.0000058 secs
[75.890s][info][safepoint,cleanup] compilation policy safepoint handler, 0.0000004 secs
[75.890s][info][safepoint,cleanup] purging class loader data graph, 0.0000001 secs
[75.890s][info][safepoint,cleanup] resizing system dictionaries, 0.0000009 secs
[75.890s][info][safepoint,cleanup] safepoint cleanup tasks, 0.0001440 secs
[75.890s][info][safepoint        ] Leaving safepoint region

是的，在 JVM 启动后的 74 秒后，发生了清理安全点（Cleanup safepoint）。清理安全点的目的是运行定期任务；在这种情况下，是为了更新内联缓存。如果有清理工作要做，安全点可能每隔 GuaranteedSafepointInterval 毫秒（默认为 1000 毫秒）发生一次。您可以通过设置 -XX:GuaranteedSafepointInterval=0 来禁用定期安全点，但这可能会有性能影响。

从 TimeUtils.now 调用 SharedRuntime::handle_wrong_method()。当编译代码中的调用点变为非入口时，就会发生这种情况。由于这与 JIT 编译有关，添加 -XX:+PrintCompilation 选项。

  75032 1430 %     4       main.TestLoop::main @ 149 (245 bytes)   made not entrant
  75033 1433 %     3       main.TestLoop::main @ 149 (245 bytes)
  75033 1434       4       util.RealtimeNanoClock::nanoTime (8 bytes)
  75034 1431       3       util.RealtimeNanoClock::nanoTime (8 bytes)   made not entrant
  75039 1435 %     4       main.TestLoop::main @ 149 (245 bytes)
  75043 1433 %     3       main.TestLoop::main @ 149 (245 bytes)   made not entrant

是的，TestLoop.main 和 RealtimeNanoClock.nanoTime 在 JVM 启动后的 75 秒后重新编译。要找出原因，请添加 -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation。

这将生成一个大的编译日志，我们将在其中查找发生在第 75 秒的事件。

<uncommon_trap thread='173414' reason='unstable_if' action='reinterpret' debug_id='0' compile_id='1232' compile_kind='osr' compiler='c2' level='4' stamp='75.676'>
<jvms bci='161' method='main.TestLoop main ([Ljava/lang/String;)V' bytes='245' count='1' backedge_count='533402' iicount='1'/>

这是一个由于字节码索引 161 处的 unstable_if 导致的非常规陷阱。换句话说，当 JIT 编译 main 时，HotSpot 没有为 else 分支生成代码，因为之前从未执行过（这是一种推测性的死代码消除）。但是，为了保持编译代码的正确性，HotSpot 在推测条件失败时放置了一个陷阱，以进行去优化并回退到解释器。这正是在您的情况下当 if 条件变为 false 时发生的情况。

`Runtime1

英文:

A voluntary context switch usually means a thread is waiting for something, e.g. for a lock to become free.

async-profiler can help to find where context switches happen. Here is a command line I used:

./profiler.sh -d 80 -e context-switches -i 2 -t -f switches.svg -I &#39;main*&#39; -X &#39;exit_to_usermode_loop*&#39; PID

Let's go through it in detail:

-d 80 run profiler for at most 80 seconds.
-e context-switches an event to profile.
-i 2 interval = 2 events. I profile every second context-switch, since the profiling signal itself causes a context switch, and I don't want to fall into recursion.
-t split the profile by threads.
-f switches.svg output file name; svg extension automatically selects Flame Graph format.
-I 'main*' include only the main thread in the output.
-X 'exit_to_usermode_loop*' exclude events related to nonvoluntary context switches.
PID Java process ID to profile.

The results may differ from one run to another. Typically I see from 0 to 3 context switches on each graph.

Here are the most common places where a context switch happens. They are indeed related to waiting on a mutex.

ThreadSafepointState::handle_polling_page_exception() called from TestLoop.main. This means, a thread has been stopped at a safepoint requested by another thread. To investigate a reason of a safepoint, add -Xlog:safepoint* JVM option.

[75.889s][info][safepoint        ] Application time: 74.0071000 seconds
[75.889s][info][safepoint        ] Entering safepoint region: Cleanup
[75.889s][info][safepoint,cleanup] deflating idle monitors, 0.0000003 secs
[75.889s][info][safepoint,cleanup] updating inline caches, 0.0000058 secs
[75.890s][info][safepoint,cleanup] compilation policy safepoint handler, 0.0000004 secs
[75.890s][info][safepoint,cleanup] purging class loader data graph, 0.0000001 secs
[75.890s][info][safepoint,cleanup] resizing system dictionaries, 0.0000009 secs
[75.890s][info][safepoint,cleanup] safepoint cleanup tasks, 0.0001440 secs
[75.890s][info][safepoint        ] Leaving safepoint region

Right, a Cleanup safepoint happens shortly after 74 seconds (exactly the specified delay).
The purpose of a Cleanup safepoint is to run periodic tasks; in this case - to update inline caches. If there is cleanup work to do, a safepoint may happen every GuaranteedSafepointInterval milliseconds (1000 by default). You can disable periodic safepoints by setting -XX:GuaranteedSafepointInterval=0, but this may have performance implications.

SharedRuntime::handle_wrong_method() from TimeUtils.now. This happens when a call site in the compiled code has been made non-entrant. As this is related to JIT compilation, add -XX:+PrintCompilation option.

  75032 1430 %     4       main.TestLoop::main @ 149 (245 bytes)   made not entrant
  75033 1433 %     3       main.TestLoop::main @ 149 (245 bytes)
  75033 1434       4       util.RealtimeNanoClock::nanoTime (8 bytes)
  75034 1431       3       util.RealtimeNanoClock::nanoTime (8 bytes)   made not entrant
  75039 1435 %     4       main.TestLoop::main @ 149 (245 bytes)
  75043 1433 %     3       main.TestLoop::main @ 149 (245 bytes)   made not entrant

Yes, both TestLoop.main and RealtimeNanoClock.nanoTime were recompiled 75 seconds after JVM start. To find out the reason, add -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation.

This will produce a large compilation log, where we'll look for an event happened at 75th second.

&lt;uncommon_trap thread=&#39;173414&#39; reason=&#39;unstable_if&#39; action=&#39;reinterpret&#39; debug_id=&#39;0&#39; compile_id=&#39;1232&#39; compile_kind=&#39;osr&#39; compiler=&#39;c2&#39; level=&#39;4&#39; stamp=&#39;75.676&#39;&gt;
&lt;jvms bci=&#39;161&#39; method=&#39;main.TestLoop main ([Ljava/lang/String;)V&#39; bytes=&#39;245&#39; count=&#39;1&#39; backedge_count=&#39;533402&#39; iicount=&#39;1&#39;/&gt;

That was an uncommon trap due to unstable_if at bytecode index 161. In other words, when main was JIT compiled, HotSpot did not produce code for the else branch, because it was never executed before (such a speculative dead code elimination). However, to retain correctness of the compiled code, HotSpot places a trap to deoptimize and fall back to the interpreter, if the speculative condition fails. This is exactly what happens in your case when if condition becomes false.

Runtime1::counter_overflow(). This is again related to recompilation. After running C1 compiled code for some time, HotSpot discovers that the code is hot, and decides to recompile it with C2.

In this case I caught a contended lock on the compiler queue.

Conclusion

HotSpot JIT compilers heavily rely on speculative optimizations. When a speculative condition fails, this leads to deoptimization. Deoptimization is indeed very bad for low latency applications: besides switching to slow execution in the interpreter, this may indirectly cause undesired pauses due to acquiring locks in the JVM runtime, or bringing the JVM to a safepoint.

Common reasons for deoptimization are unstable_if and class_check. If you want to avoid deoptimization on a latency critical path, make sure to "warm-up" all code paths and all possible receivers for virtual methods.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

When busy-spining Java thread is bound to physical core, can context switch happen by the reason that new branch in code is reached?

问题

答案1

如何从 JSON 属性中将值移至上一级（Java）。

避免在Android绘制/布局操作期间进行对象分配错误

constant expression required when trying to create a switch case block

如何在Spring Framework前端接收验证消息？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。