Java 8连续流(sequential streams)会使CPU使用率非常高。

Java 8 sequential streams increase CPU usage very high


在我的 Spring Boot 服务中,我基于订单详细信息和客户详细信息进行了传入订单的验证。

在客户详细信息中,我有不同的对象列表,如 Services、Attributes、Products 等,对于每个列表,我正在执行以下类似的操作:

我多次像这样使用流来处理产品、服务和属性。我们观察到从性能方面来看,这导致了非常高的 TPS,并且内存使用也非常高效。但这会大量消耗 CPU。我们在 Kubernetes pod 中运行该服务,并且它占用了提供的 90% CPU。

还有一个有趣的观察结果是,我们提供的 CPU 越多,实现的 TPS 越高,CPU 使用率也达到 90%。

这是因为流消耗了更多的 CPU 吗?还是因为垃圾回收很频繁,因为在每次流迭代之后,内部内存可能会被垃圾回收?



  • 每当我们增加并发线程时,由于 CPU 使用率过高,服务开始不响应,随后 CPU 急剧下降,从而导致 TPS 降低。
  • 每当我们减少并发线程时,CPU 使用率仍然很高,但服务的性能非常好,即高 TPS。

以下是在不同的 CPU/线程配置下的 TPS 与 CPU 统计数据。

CPU: 1500m,线程数:70

TPS 176 140 125 79 63
CPU 1052 405 201 84 13

CPU: 1500m,线程数:35

TPS 500 510 500 530
CPU 1172 1349 1310 1214

CPU: 2500m,线程数:70

TPS 20 20 25 28 26
CPU 2063 2429 2303 879 35

CPU: 2500m,线程数:35

TPS 1193 1200 1200 1230
CPU 600 1908 2044 1949

Tomcat 配置信息:



线程转储分析显示:80% 的 http-nio 线程处于 等待条件 状态。这意味着所有线程都在等待某些东西,没有线程在消耗任何 CPU,这解释了低 CPU 使用率。但是是什么导致了线程进入等待状态呢?在服务中没有使用异步调用。 即使我没有使用并行流,只使用了如上所述的顺序流。

以下是当 CPU 和 TPS 下降时的线程转储:

"http-nio-8090-exec-72" #125 daemon prio=5 os_prio=0 tid=0x00007f014001e800 nid=0x8f waiting on condition [0x00007f0158ae1000]
   java.lang.Thread.State: **TIMED_WAITING** (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00000000d7470b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(
    at java.util.concurrent.LinkedBlockingQueue.poll(
    at org.apache.tomcat.util.threads.TaskQueue.poll(
    at org.apache.tomcat.util.threads.TaskQueue.poll(
    at java.util.concurrent.ThreadPoolExecutor.getTask(
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$
    at org.apache.tomcat.util.threads.TaskThread$

   Locked ownable synchronizers:
    - None

In my spring boot service, I am validating incoming orders based upon order details and customer details.

In customer details, I have different lists of objects like Services, Attributes, Products, etc. and for every list, I am doing something like below:  

I am using streams like this many times for products, services & attributes. We observed that performance-wise it is giving very high TPS and memory usage is also very optimal. But this is consuming CPU very much. We are running the service in Kubernetes pods and it is taking 90% of the CPU provided.

One more interesting observation is, the more CPU we give, TPS achieved is higher and CPU usage also reaches 90%.

Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?


Upon further investigation using Load Testing, it is observed that:

  • Whenever we increase concurrent threads, due to high CPU usage the service starts not responding and followed by a sudden decrease in CPU and thus resulting in low TPS.
  • Whenever we decrease concurrent threads, CPU usage still remains high but the service is performing in the most optimal way i.e. high TPS.

The following are the statistics of TPS vs. CPU under different CPU/thread configuration.

CPU: 1500m, Threads:70

| TPS | 176  | 140 | 125 | 79 | 63 |
| CPU | 1052 | 405 | 201 | 84 | 13 |  

CPU: 1500m, Threads:35

| TPS | 500 | 510 | 500 | 530 |
| CPU | 1172| 1349| 1310| 1214|  

CPU: 2500m, Threads:70

| TPS |  20 |  20 |  25 |  28 | 26 |
| CPU | 2063| 2429| 2303| 879 | 35 |  

CPU: 2500m, Threads:35

| TPS | 1193 | 1200 | 1200 | 1230 |
| CPU | 600  | 1908 | 2044 | 1949 | 

Tomcat Configuration Used:


The thread dump analysis says: 80% of the http-nio threads are in Waiting on condition state. That means all the threads are waiting for something and no one is consuming any CPU that explains low CPU usage. But what could be causing the threads going for waiting? I'm not using any Asynchronous Calls in the service also. Even I'm not using any parallel streams, only sequential streams as mentioned above.

The following is the Thread dump when CPU and TPS go down:

&quot;http-nio-8090-exec-72&quot; #125 daemon prio=5 os_prio=0 tid=0x00007f014001e800 nid=0x8f waiting on condition [0x00007f0158ae1000]
   java.lang.Thread.State: **TIMED_WAITING** (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  &lt;0x00000000d7470b10&gt; (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(
	at java.util.concurrent.LinkedBlockingQueue.poll(
	at org.apache.tomcat.util.threads.TaskQueue.poll(
	at org.apache.tomcat.util.threads.TaskQueue.poll(
	at java.util.concurrent.ThreadPoolExecutor.getTask(
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$
	at org.apache.tomcat.util.threads.TaskThread$

   Locked ownable synchronizers:
	- None


得分: 1

是因为流(Streams)消耗更多的CPU吗?还是因为高垃圾回收(Garbage Collection),因为在每次流迭代之后,内部内存可能会被垃圾回收?




(在您的情况下,我不建议在流中添加 parallel()。因为您的服务已经受限于计算(或交换),没有多余的周期来并行运行流。在这里使用 parallel() 可能会减少每秒事务处理量(TPS)。)



  • 在应用程序需要大量计算时,添加更多线程是无益的。
  • 更多线程意味着更多的内存利用(线程栈 + 仅从线程栈可访问的对象)。
  • 更多内存利用意味着垃圾回收的效率会降低。
  • 如果您的JVM使用的虚拟内存超过了物理内存,那么操作系统通常必须将页面从RAM交换到磁盘,然后再交换回来。这会影响性能,尤其是在垃圾回收期间。


我们无法确定实际导致问题的原因,但如果我是您,我会查看Java GC日志,并使用诸如 vmstatiostat 这样的操作系统工具,以寻找过多分页和过多I/O的迹象。


> Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

Clearly streams do consume CPU. And generally speaking, code implemented using non-parallel streams does run a bit slower than code implemented using old-fashioned loops. However, the difference in performance is not huge. (Maybe 5 or 10%?)

In general, a stream does not generate more garbage than an old-fashioned loop performing the same computation. For instance if we compared your example with a loop doing the same thing (i.e. generating a new list), then I would expect there to be a 1-to-1 correspondence between the memory allocations for the two versions.

In short, I don't think streams are directly implicated in this. Obviously, if your service is processing a lot of lists (using streams or loops) for each request, then that is going to affect the TPS. And even more so if the lists are actually fetched from your backend database. But that's normal too. This could be addressed by doing things like request caching, and tweaking the granularity of API requests to compute expensive results that the caller doesn't actually need.

(I would NOT recommend adding parallel() to your streams in your scenario. Since your service are already compute (or swap) bound, there are no "spare" cycles to run the streams in parallel. Using parallel() here is likely to reduce your TPS.)

The second part of your question is about performance (TPS) versus the thread count versus (we think) VCPUs. It is not possible to interpret the results you have given because you don't explain the units of measurements, and .... because I suspect that there other factors in play.

However, as a general rule:

  • Adding more threads when an application is compute intensive doesn't help.
  • More threads means more memory utilization (thread stacks + objects only reachable from thread stacks).
  • More memory utilization means the GC will be less ergonomic.
  • If your JVM is using more virtual memory than you have physical memory, then the OS will typically have to swap pages from RAM to disk and back. This impacts on performance, especially during garbage collection.

It is also possible that there are effects that can be attributed to your cloud platform. For example, if your are running in a virtual server on a compute node with lots of virtual servers, you many not get a full CPU's worth per VCPU. And if your virtual server is generating a lot of swap traffic, that will most likely reduce your server's share of the CPU resources even further.

We cannot say what is actually causing your problem, but if I was in your shoes I would be looking at the Java GC logs, and using OS tools like vmstat and iostat to look for signs of excessive paging and excessive I/O in general.


得分: 0

是因为流(Streams)消耗更多 CPU 吗?

我猜你的意思是:相较于循环,流是否消耗更多的 CPU 资源?<br>






> Is it because Streams consume more CPU?

I assume you mean: Consume Streams more CPU than loops?<br>
It seems there is not much difference, if loop and stream are doing the same things.<br>
Depending on the exact case there might be small differences. Here are 2 other articles about this question (with this outcome): <br>

> Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

Based on your code snippet this question can not be answered. I don't see if some objects are not referenced anymore, so that garbage collection has something to do.<br>
In this question it is explained what triggers the garbage collection:<br>;br>
But your question contains no information about memory usage.

If you want to tune your code, using parallel streams could be an option:  

You will find several articles (e.g. see my first link), which come to conclusion, that there are scenarios, where parallel streams are really faster. So you could try this to increase performance.

