Java 8连续流(sequential streams)会使CPU使用率非常高。

huangapple go评论79阅读模式
英文:

Java 8 sequential streams increase CPU usage very high

问题

在我的 Spring Boot 服务中,我基于订单详细信息和客户详细信息进行了传入订单的验证。

在客户详细信息中,我有不同的对象列表,如 Services、Attributes、Products 等,对于每个列表,我正在执行以下类似的操作:

products.stream()
       .filter(Objects::nonNull)
       .map(Product::getResource)
       .filter(Objects::nonNull)
       .filter(<SimplePredicate>)
       .collect(Collectors.toList());

我多次像这样使用流来处理产品、服务和属性。我们观察到从性能方面来看,这导致了非常高的 TPS,并且内存使用也非常高效。但这会大量消耗 CPU。我们在 Kubernetes pod 中运行该服务,并且它占用了提供的 90% CPU。

还有一个有趣的观察结果是,我们提供的 CPU 越多,实现的 TPS 越高,CPU 使用率也达到 90%。

这是因为流消耗了更多的 CPU 吗?还是因为垃圾回收很频繁,因为在每次流迭代之后,内部内存可能会被垃圾回收?

编辑1:

经过进一步的负载测试,观察到:

  • 每当我们增加并发线程时,由于 CPU 使用率过高,服务开始不响应,随后 CPU 急剧下降,从而导致 TPS 降低。
  • 每当我们减少并发线程时,CPU 使用率仍然很高,但服务的性能非常好,即高 TPS。

以下是在不同的 CPU/线程配置下的 TPS 与 CPU 统计数据。

CPU: 1500m,线程数:70

TPS 176 140 125 79 63
CPU 1052 405 201 84 13

CPU: 1500m,线程数:35

TPS 500 510 500 530
CPU 1172 1349 1310 1214

CPU: 2500m,线程数:70

TPS 20 20 25 28 26
CPU 2063 2429 2303 879 35

CPU: 2500m,线程数:35

TPS 1193 1200 1200 1230
CPU 600 1908 2044 1949

Tomcat 配置信息:

server.tomcat.max-connections=100
server.tomcat.max-threads=100
server.tomcat.min-spare-threads=5

编辑2:

线程转储分析显示:80% 的 http-nio 线程处于 等待条件 状态。这意味着所有线程都在等待某些东西,没有线程在消耗任何 CPU,这解释了低 CPU 使用率。但是是什么导致了线程进入等待状态呢?在服务中没有使用异步调用。 即使我没有使用并行流,只使用了如上所述的顺序流。

以下是当 CPU 和 TPS 下降时的线程转储:

"http-nio-8090-exec-72" #125 daemon prio=5 os_prio=0 tid=0x00007f014001e800 nid=0x8f waiting on condition [0x00007f0158ae1000]
   java.lang.Thread.State: **TIMED_WAITING** (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00000000d7470b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:89)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:33)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None
英文:

In my spring boot service, I am validating incoming orders based upon order details and customer details.

In customer details, I have different lists of objects like Services, Attributes, Products, etc. and for every list, I am doing something like below:

products.stream()  
	   .filter(Objects::nonNull)  
	   .map(Product::getResource)  
	   .filter(Objects::nonNull)  
	   .filter(&lt;SimplePredicate&gt;)  
	   .collect(Collectors.toList());  

I am using streams like this many times for products, services & attributes. We observed that performance-wise it is giving very high TPS and memory usage is also very optimal. But this is consuming CPU very much. We are running the service in Kubernetes pods and it is taking 90% of the CPU provided.

One more interesting observation is, the more CPU we give, TPS achieved is higher and CPU usage also reaches 90%.

Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

EDIT-1:

Upon further investigation using Load Testing, it is observed that:

  • Whenever we increase concurrent threads, due to high CPU usage the service starts not responding and followed by a sudden decrease in CPU and thus resulting in low TPS.
  • Whenever we decrease concurrent threads, CPU usage still remains high but the service is performing in the most optimal way i.e. high TPS.

The following are the statistics of TPS vs. CPU under different CPU/thread configuration.

CPU: 1500m, Threads:70

| TPS | 176  | 140 | 125 | 79 | 63 |
|----------------------------------|
| CPU | 1052 | 405 | 201 | 84 | 13 |  

CPU: 1500m, Threads:35

| TPS | 500 | 510 | 500 | 530 |
|-----------------------------|
| CPU | 1172| 1349| 1310| 1214|  

CPU: 2500m, Threads:70

| TPS |  20 |  20 |  25 |  28 | 26 |
|----------------------------------|
| CPU | 2063| 2429| 2303| 879 | 35 |  

CPU: 2500m, Threads:35

| TPS | 1193 | 1200 | 1200 | 1230 |
|---------------------------------|
| CPU | 600  | 1908 | 2044 | 1949 | 

Tomcat Configuration Used:

server.tomcat.max-connections=100
server.tomcat.max-threads=100
server.tomcat.min-spare-threads=5

EDIT-2:
The thread dump analysis says: 80% of the http-nio threads are in Waiting on condition state. That means all the threads are waiting for something and no one is consuming any CPU that explains low CPU usage. But what could be causing the threads going for waiting? I'm not using any Asynchronous Calls in the service also. Even I'm not using any parallel streams, only sequential streams as mentioned above.

The following is the Thread dump when CPU and TPS go down:

&quot;http-nio-8090-exec-72&quot; #125 daemon prio=5 os_prio=0 tid=0x00007f014001e800 nid=0x8f waiting on condition [0x00007f0158ae1000]
   java.lang.Thread.State: **TIMED_WAITING** (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  &lt;0x00000000d7470b10&gt; (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:89)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:33)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

答案1

得分: 1

是因为流(Streams)消耗更多的CPU吗?还是因为高垃圾回收(Garbage Collection),因为在每次流迭代之后,内部内存可能会被垃圾回收?

显然,流确实会消耗CPU。一般来说,使用非并行流实现的代码运行比使用老式循环实现的代码稍慢一些。然而,性能差异并不是很大。(也许是5%或10%?)

一般来说,流不会生成比执行相同计算的老式循环更多的垃圾。例如,如果我们将您的示例与执行相同操作的循环进行比较(即生成新列表),那么我期望两个版本的内存分配之间会有一对一的对应关系。

简而言之,我认为流并不直接涉及其中。显然,如果您的服务针对每个请求处理大量列表(使用流或循环),那么这将影响每秒事务处理量(TPS)。如果这些列表实际上是从后端数据库获取的,则情况会更加复杂。但这也是正常现象。可以通过诸如请求缓存、调整API请求的粒度以计算调用者实际不需要的昂贵结果等方式来解决这个问题。

(在您的情况下,我不建议在流中添加 parallel()。因为您的服务已经受限于计算(或交换),没有多余的周期来并行运行流。在这里使用 parallel() 可能会减少每秒事务处理量(TPS)。)

您问题的第二部分涉及性能(TPS)、线程数和(我们认为的)虚拟CPU核心数。由于您没有解释测量单位,因此很难解释您给出的结果……因为我怀疑还有其他因素在起作用。

然而,作为一般规则:

  • 在应用程序需要大量计算时,添加更多线程是无益的。
  • 更多线程意味着更多的内存利用(线程栈 + 仅从线程栈可访问的对象)。
  • 更多内存利用意味着垃圾回收的效率会降低。
  • 如果您的JVM使用的虚拟内存超过了物理内存,那么操作系统通常必须将页面从RAM交换到磁盘,然后再交换回来。这会影响性能,尤其是在垃圾回收期间。

也有可能存在一些可以归因于您的云平台的影响。例如,如果您在计算节点上的虚拟服务器上运行,而该节点有许多虚拟服务器,您可能无法获得完整的每个虚拟CPU核心的计算资源。如果您的虚拟服务器生成了大量的交换流量,那么这很可能会进一步降低服务器获得的CPU资源份额。

我们无法确定实际导致问题的原因,但如果我是您,我会查看Java GC日志,并使用诸如 vmstatiostat 这样的操作系统工具,以寻找过多分页和过多I/O的迹象。

英文:

> Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

Clearly streams do consume CPU. And generally speaking, code implemented using non-parallel streams does run a bit slower than code implemented using old-fashioned loops. However, the difference in performance is not huge. (Maybe 5 or 10%?)

In general, a stream does not generate more garbage than an old-fashioned loop performing the same computation. For instance if we compared your example with a loop doing the same thing (i.e. generating a new list), then I would expect there to be a 1-to-1 correspondence between the memory allocations for the two versions.

In short, I don't think streams are directly implicated in this. Obviously, if your service is processing a lot of lists (using streams or loops) for each request, then that is going to affect the TPS. And even more so if the lists are actually fetched from your backend database. But that's normal too. This could be addressed by doing things like request caching, and tweaking the granularity of API requests to compute expensive results that the caller doesn't actually need.

(I would NOT recommend adding parallel() to your streams in your scenario. Since your service are already compute (or swap) bound, there are no "spare" cycles to run the streams in parallel. Using parallel() here is likely to reduce your TPS.)

The second part of your question is about performance (TPS) versus the thread count versus (we think) VCPUs. It is not possible to interpret the results you have given because you don't explain the units of measurements, and .... because I suspect that there other factors in play.

However, as a general rule:

  • Adding more threads when an application is compute intensive doesn't help.
  • More threads means more memory utilization (thread stacks + objects only reachable from thread stacks).
  • More memory utilization means the GC will be less ergonomic.
  • If your JVM is using more virtual memory than you have physical memory, then the OS will typically have to swap pages from RAM to disk and back. This impacts on performance, especially during garbage collection.

It is also possible that there are effects that can be attributed to your cloud platform. For example, if your are running in a virtual server on a compute node with lots of virtual servers, you many not get a full CPU's worth per VCPU. And if your virtual server is generating a lot of swap traffic, that will most likely reduce your server's share of the CPU resources even further.

We cannot say what is actually causing your problem, but if I was in your shoes I would be looking at the Java GC logs, and using OS tools like vmstat and iostat to look for signs of excessive paging and excessive I/O in general.

答案2

得分: 0

是因为流(Streams)消耗更多 CPU 吗?

我猜你的意思是:相较于循环,流是否消耗更多的 CPU 资源?<br>
看起来差别并不大,如果循环和流在执行相同的操作时。<br>
根据具体情况可能会有一些小差异。这里有另外两篇关于这个问题的文章(得出了这个结论):<br>
https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html
https://dzone.com/articles/java-performance-for-looping-vs-streaming

还是因为每次流的迭代之后,内部内存可能会被垃圾回收,从而导致高垃圾回收?

根据你的代码片段,无法回答这个问题。我没有看到是否有一些对象不再被引用,从而与垃圾回收有关。<br>
在这个问题中,解释了是什么触发了垃圾回收:<br>
https://stackoverflow.com/questions/17483415/what-triggers-garbage-collection&lt;br>
但是你的问题没有提供关于内存使用的信息。

如果你想要调优你的代码,使用并行流可能是一个选择:

products.stream().parallel()  
   .filter(Objects::nonNull)
   ...

你会找到一些文章(例如参考我给出的第一个链接),得出结论:在某些情况下,并行流确实更快。因此你可以尝试这个方法来提升性能。

英文:

> Is it because Streams consume more CPU?

I assume you mean: Consume Streams more CPU than loops?<br>
It seems there is not much difference, if loop and stream are doing the same things.<br>
Depending on the exact case there might be small differences. Here are 2 other articles about this question (with this outcome): <br>
https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html
https://dzone.com/articles/java-performance-for-looping-vs-streaming

> Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

Based on your code snippet this question can not be answered. I don't see if some objects are not referenced anymore, so that garbage collection has something to do.<br>
In this question it is explained what triggers the garbage collection:<br>
https://stackoverflow.com/questions/17483415/what-triggers-garbage-collection&lt;br>
But your question contains no information about memory usage.

If you want to tune your code, using parallel streams could be an option:

products.stream().parallel()  
   .filter(Objects::nonNull)
   ...

You will find several articles (e.g. see my first link), which come to conclusion, that there are scenarios, where parallel streams are really faster. So you could try this to increase performance.

huangapple
  • 本文由 发表于 2020年10月21日 16:02:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/64459192.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定