2020年9月24日 23:49:59go评论84阅读模式

英文:

what may cause waiting state thread increase all the time in Java 8

问题

今天我发现我的Java 8应用程序有许多线程处于等待状态：

[arthas@1]$ thread --state RUNNABLE
线程总数：3427，NEW：0，RUNNABLE：17，BLOCKED：0，WAITING：3114，TIMED_WAITING：296，TERMINATED：0
ID             名称                                         线程组                          优先级       状态          %CPU           时间           被中断       守护进程         
124            pool-11-thread-25                            main                           5              RUNNABLE       75             0:0            false          false          
53             as-command-execute-daemon                    system                         10             RUNNABLE       23             0:0            false          true           
133            Thread-20                                    main                           5              RUNNABLE       1              0:2            false          true           
28             Apollo-RemoteConfigLongPollService-1         Apollo                         5              RUNNABLE       0              0:0            false          true           
32             Attach Listener                              system                         9              RUNNABLE       0              0:0            false          true           
99             DestroyJavaVM                                main                           5              RUNNABLE       0              0:39           false          false          
4              Signal Dispatcher                            system                         9              RUNNABLE       0              0:0            false          true           
19             grpc-default-worker-ELG-1-1                  main                           5              RUNNABLE       0              0:0            false          true           
21             grpc-default-worker-ELG-1-2                  main                           5              RUNNABLE       0              0:0            false          true           
97             http-nio-11003-Acceptor                      main                           5              RUNNABLE       0              0:0            false          true           
85             http-nio-11003-BlockPoller                   main                           5              RUNNABLE       0              0:0            false          true           
96             http-nio-11003-ClientPoller                  main                           5              RUNNABLE       0              0:0            false          true           
54             lettuce-nioEventLoop-4-1                     main                           5              RUNNABLE       0              0:0            false          true           
70             lettuce-nioEventLoop-4-2                     main                           5              RUNNABLE       0              0:0            false          true           
36             nioEventLoopGroup-3-1                        system                         10             RUNNABLE       0              0:0            false          false          
42             nioEventLoopGroup-3-2                        system                         10             RUNNABLE       0              0:0            false          false          
37             nioEventLoopGroup-4-1                        system                         10             RUNNABLE       0              0:0            false          false          
影响行数：0，耗时120毫秒。

现在有3000多个线程处于等待状态，我随机选择了一个等待状态的线程池线程，显示如下：

[arthas@1]$ thread 4410
"pool-96-thread-10" Id=4410 在 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3e27c029 上等待
        在 sun.misc.Unsafe.park(Native Method) 
        - 等待 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3e27c029
        在 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        在 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        在 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
        在 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
        在 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        在 java.lang.Thread.run(Thread.java:748)
影响行数：0，耗时16毫秒。

但问题是我不知道线程是在哪里启动的，以及是什么导致等待线程增加。有没有办法找出线程从何处开始，或者为什么等待线程在增加？我现在正在使用Java ThreadExecutor。现在等待线程已经多达6000+。我添加了一个自定义配置：

@Configuration
public class ScheduleConfig implements SchedulingConfigurer {

    @Override
    public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
        taskRegistrar.setScheduler(Executors.newScheduledThreadPool(30));
    }

}

英文:

Today I found my Java 8 apps have many thread is in WAITING state:

[arthas@1]$ thread --state RUNNABLE
Threads Total: 3427, NEW: 0, RUNNABLE: 17, BLOCKED: 0, WAITING: 3114, TIMED_WAITING: 296, TERMINATED: 0                                                                              
ID             NAME                                         GROUP                          PRIORITY       STATE          %CPU           TIME           INTERRUPTED    DAEMON         
124            pool-11-thread-25                            main                           5              RUNNABLE       75             0:0            false          false          
53             as-command-execute-daemon                    system                         10             RUNNABLE       23             0:0            false          true           
133            Thread-20                                    main                           5              RUNNABLE       1              0:2            false          true           
28             Apollo-RemoteConfigLongPollService-1         Apollo                         5              RUNNABLE       0              0:0            false          true           
32             Attach Listener                              system                         9              RUNNABLE       0              0:0            false          true           
99             DestroyJavaVM                                main                           5              RUNNABLE       0              0:39           false          false          
4              Signal Dispatcher                            system                         9              RUNNABLE       0              0:0            false          true           
19             grpc-default-worker-ELG-1-1                  main                           5              RUNNABLE       0              0:0            false          true           
21             grpc-default-worker-ELG-1-2                  main                           5              RUNNABLE       0              0:0            false          true           
97             http-nio-11003-Acceptor                      main                           5              RUNNABLE       0              0:0            false          true           
85             http-nio-11003-BlockPoller                   main                           5              RUNNABLE       0              0:0            false          true           
96             http-nio-11003-ClientPoller                  main                           5              RUNNABLE       0              0:0            false          true           
54             lettuce-nioEventLoop-4-1                     main                           5              RUNNABLE       0              0:0            false          true           
70             lettuce-nioEventLoop-4-2                     main                           5              RUNNABLE       0              0:0            false          true           
36             nioEventLoopGroup-3-1                        system                         10             RUNNABLE       0              0:0            false          false          
42             nioEventLoopGroup-3-2                        system                         10             RUNNABLE       0              0:0            false          false          
37             nioEventLoopGroup-4-1                        system                         10             RUNNABLE       0              0:0            false          false          
Affect(row-cnt:0) cost in 120 ms.

that have 3000+ thread is in WAITING state, now I pick a random WAITING thread pool thread, shows like this::

[arthas@1]$ thread 4410
&quot;pool-96-thread-10&quot; Id=4410 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3e27c029
    at sun.misc.Unsafe.park(Native Method)
    -  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3e27c029
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Affect(row-cnt:0) cost in 16 ms.

but the problem is I don't know where started the thread and what make the waiting thread increase. Is there any way to find out where to start the thread or why the WAITING thread increase? I am now using Java ThreadExecutor. Right now the waiting thread is up to 6000+. I add a custom config:

@Configuration
public class ScheduleConfig implements SchedulingConfigurer {

    @Override
    public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
        taskRegistrar.setScheduler(Executors.newScheduledThreadPool(30));
    }

}

答案1

得分: 2

你展示的堆栈跟踪是“情况正常”的：这是一个线程池执行器线程，准备好进行工作，但工作队列是空的。在这种情况下，“等待”意味着：我正在等待要执行的任务，而不是：“我有任务要做，但不能执行，因为我正在等待任务先完成”。

现在，3000个线程本身就有点令人担忧；每个线程都有自己的堆栈空间。堆栈空间的大小取决于您的“-Xss”参数，但它们往往从64k到1MB不等。如果是1MB，那就是... 3GB 的堆栈空间，这是... 不太理想的。这个数字（等待接受作业的线程数量）在虚拟机运行一段时间后也不应该增长太多。

如果所有/大多数这些等待的线程具有类似的跟踪，那么实际上只有两个选项：

您创建了一个执行器，并在随后的时间里不断要求它添加更多的线程。我对此表示怀疑，但这是可能的。
您不断创建执行器。不要这样做。

执行器背后的想法是只创建一个，或者至少非常非常少的这些。

如果您必须在运行中的应用程序中创建它们（而不是通常的做法，即创建作业并将其提供给单例执行器），那么请注意它们实际上是资源：如果您不“关闭”它们，您的进程将需要越来越多的资源，最终在资源耗尽时虚拟机会崩溃。

要关闭它们，您可以调用shutdown()，这是一种温和的关闭方式，以及shutdownNow()，这是一种更为激进的方式，会永久地取消所有尚未被接受的作业。

因此，简要总结一下：

在应用程序的正常处理过程中，您正在创建新的执行器。在代码库中搜索new ScheduledThreadPoolExecutor，并检查情况。如果必要，添加一些日志以查看其运行情况。
然后，很可能您希望修复这个问题，首先停止创建新的执行器 - 只创建一个执行器，并将作业提供给这个执行器。
如果确实有理由创建它们，使用一些守护构造来确保在使用完毕后也将其清理掉。您可以搜索如何安全地执行此操作；这有点复杂，因为您需要决定如何处理队列中尚未完成的任何作业。如果这不是问题，那很容易：.shutdown()将完成这项工作。

英文:

The stack trace you have shown is 'situation normal': That is a threadpool executor thread that is ready to do work, but the queue of work is empty. In this case, 'waiting' means: I'm waiting for a job to do, not: "I have stuff to do, but cannot do because I am waiting for stuff to be finished first".

Now, 3000 threads is itself somewhat of a concern; each thread has its own stack space. How large that is depends on your -Xss parameter, but they tend to go from 64k to 1MB. If it's 1MB, that's... 3GB of stack space, that's... suboptimal. This number (how many threads you have waiting around for a job to accept) should also not be growing much after a VM has heated up.

If all/most of those WAITING threads have a similar trace, then there are really only two options:

You made an executor and keep asking it, over time, to add more and more threads. I doubt this, but it's possible.
You keep making executors. Don't do this.

The idea behind an executor is that you make only one, or at least very very few of these.

If you MUST create them as part of your running app (vs. the normal thing, of creating jobs and feeding them to the singleton executor), then be aware they are effectively resources: if you don't 'close' them, your process will require more and more resources, until eventually the VM will crash when it runs out.

To close them, you invoke shutdown() which is asking nicely, and shutdownNow() which is more aggressive and will any not-yet-picked-up jobs permanently undone.

So, to recap:

You are making new executors during normal processing in your app. Search for new ScheduledThreadPoolExecutor in your codebase and inspect the situation. Add some logging if you must to see this in action.
Then, most likely, you want to fix this and stop making new executors in the first place - just make one, once, and feed jobs into this one executor.
If truly it makes sense to make them, use some guardian construct to ensure that you also clean them up when you're done using them. You can search for how to do so safely; it's a bit complicated, as you need to decide what to do with any jobs in the queue that are not yet done. If that's not an issue, it's easy: .shutdown() will get the job done.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可能导致Java 8中等待状态线程始终增加的原因是什么？

问题

答案1

如何在Groovy中从特定字符串删除到段落末尾至末尾。

适用于 Scanner 和 BufferedReader 的 try、catch 和 finally 的正确方法。

可以在较新的JRE上运行较旧的Java程序时使用更新的Java API吗？

为什么IntelliJ识别了模块依赖，但仍然抛出“包不存在”编译错误？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论