2020年10月21日 16:02:35go评论99阅读模式

英文:

Java 8 sequential streams increase CPU usage very high

问题

在我的 Spring Boot 服务中，我基于订单详细信息和客户详细信息进行了传入订单的验证。

在客户详细信息中，我有不同的对象列表，如 Services、Attributes、Products 等，对于每个列表，我正在执行以下类似的操作：

products.stream()
       .filter(Objects::nonNull)
       .map(Product::getResource)
       .filter(Objects::nonNull)
       .filter(<SimplePredicate>)
       .collect(Collectors.toList());

我多次像这样使用流来处理产品、服务和属性。我们观察到从性能方面来看，这导致了非常高的 TPS，并且内存使用也非常高效。但这会大量消耗 CPU。我们在 Kubernetes pod 中运行该服务，并且它占用了提供的 90% CPU。

还有一个有趣的观察结果是，我们提供的 CPU 越多，实现的 TPS 越高，CPU 使用率也达到 90%。

这是因为流消耗了更多的 CPU 吗？还是因为垃圾回收很频繁，因为在每次流迭代之后，内部内存可能会被垃圾回收？

编辑1：

经过进一步的负载测试，观察到：

每当我们增加并发线程时，由于 CPU 使用率过高，服务开始不响应，随后 CPU 急剧下降，从而导致 TPS 降低。
每当我们减少并发线程时，CPU 使用率仍然很高，但服务的性能非常好，即高 TPS。

以下是在不同的 CPU/线程配置下的 TPS 与 CPU 统计数据。

CPU: 1500m，线程数：70

TPS	176	140	125	79	63
CPU	1052	405	201	84	13

CPU: 1500m，线程数：35

TPS	500	510	500	530
CPU	1172	1349	1310	1214

CPU: 2500m，线程数：70

TPS	20	20	25	28	26
CPU	2063	2429	2303	879	35

CPU: 2500m，线程数：35

TPS	1193	1200	1200	1230
CPU	600	1908	2044	1949

Tomcat 配置信息：

server.tomcat.max-connections=100
server.tomcat.max-threads=100
server.tomcat.min-spare-threads=5

编辑2：

线程转储分析显示：80% 的 http-nio 线程处于 等待条件 状态。这意味着所有线程都在等待某些东西，没有线程在消耗任何 CPU，这解释了低 CPU 使用率。但是是什么导致了线程进入等待状态呢？在服务中没有使用异步调用。 即使我没有使用并行流，只使用了如上所述的顺序流。

以下是当 CPU 和 TPS 下降时的线程转储：

"http-nio-8090-exec-72" #125 daemon prio=5 os_prio=0 tid=0x00007f014001e800 nid=0x8f waiting on condition [0x00007f0158ae1000]
   java.lang.Thread.State: **TIMED_WAITING** (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00000000d7470b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:89)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:33)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

英文:

In my spring boot service, I am validating incoming orders based upon order details and customer details.

In customer details, I have different lists of objects like Services, Attributes, Products, etc. and for every list, I am doing something like below:

products.stream()  
	   .filter(Objects::nonNull)  
	   .map(Product::getResource)  
	   .filter(Objects::nonNull)  
	   .filter(&lt;SimplePredicate&gt;)  
	   .collect(Collectors.toList());

I am using streams like this many times for products, services & attributes. We observed that performance-wise it is giving very high TPS and memory usage is also very optimal. But this is consuming CPU very much. We are running the service in Kubernetes pods and it is taking 90% of the CPU provided.

One more interesting observation is, the more CPU we give, TPS achieved is higher and CPU usage also reaches 90%.

Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

EDIT-1:

Upon further investigation using Load Testing, it is observed that:

Whenever we increase concurrent threads, due to high CPU usage the service starts not responding and followed by a sudden decrease in CPU and thus resulting in low TPS.
Whenever we decrease concurrent threads, CPU usage still remains high but the service is performing in the most optimal way i.e. high TPS.

The following are the statistics of TPS vs. CPU under different CPU/thread configuration.

CPU: 1500m, Threads:70

| TPS | 176  | 140 | 125 | 79 | 63 |
|----------------------------------|
| CPU | 1052 | 405 | 201 | 84 | 13 |

CPU: 1500m, Threads:35

| TPS | 500 | 510 | 500 | 530 |
|-----------------------------|
| CPU | 1172| 1349| 1310| 1214|

CPU: 2500m, Threads:70

| TPS |  20 |  20 |  25 |  28 | 26 |
|----------------------------------|
| CPU | 2063| 2429| 2303| 879 | 35 |

CPU: 2500m, Threads:35

| TPS | 1193 | 1200 | 1200 | 1230 |
|---------------------------------|
| CPU | 600  | 1908 | 2044 | 1949 |

Tomcat Configuration Used:

server.tomcat.max-connections=100
server.tomcat.max-threads=100
server.tomcat.min-spare-threads=5

EDIT-2:
The thread dump analysis says: 80% of the http-nio threads are in Waiting on condition state. That means all the threads are waiting for something and no one is consuming any CPU that explains low CPU usage. But what could be causing the threads going for waiting? I'm not using any Asynchronous Calls in the service also. Even I'm not using any parallel streams, only sequential streams as mentioned above.

The following is the Thread dump when CPU and TPS go down:

&quot;http-nio-8090-exec-72&quot; #125 daemon prio=5 os_prio=0 tid=0x00007f014001e800 nid=0x8f waiting on condition [0x00007f0158ae1000]
   java.lang.Thread.State: **TIMED_WAITING** (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  &lt;0x00000000d7470b10&gt; (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:89)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:33)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

答案1

得分: 1

是因为流(Streams)消耗更多的CPU吗？还是因为高垃圾回收(Garbage Collection)，因为在每次流迭代之后，内部内存可能会被垃圾回收？

显然，流确实会消耗CPU。一般来说，使用非并行流实现的代码运行比使用老式循环实现的代码稍慢一些。然而，性能差异并不是很大。（也许是5%或10%？）

一般来说，流不会生成比执行相同计算的老式循环更多的垃圾。例如，如果我们将您的示例与执行相同操作的循环进行比较（即生成新列表），那么我期望两个版本的内存分配之间会有一对一的对应关系。

简而言之，我认为流并不直接涉及其中。显然，如果您的服务针对每个请求处理大量列表（使用流或循环），那么这将影响每秒事务处理量(TPS)。如果这些列表实际上是从后端数据库获取的，则情况会更加复杂。但这也是正常现象。可以通过诸如请求缓存、调整API请求的粒度以计算调用者实际不需要的昂贵结果等方式来解决这个问题。

（在您的情况下，我不建议在流中添加 parallel()。因为您的服务已经受限于计算（或交换），没有多余的周期来并行运行流。在这里使用 parallel() 可能会减少每秒事务处理量(TPS)。）

您问题的第二部分涉及性能（TPS）、线程数和（我们认为的）虚拟CPU核心数。由于您没有解释测量单位，因此很难解释您给出的结果……因为我怀疑还有其他因素在起作用。

然而，作为一般规则：

在应用程序需要大量计算时，添加更多线程是无益的。
更多线程意味着更多的内存利用（线程栈 + 仅从线程栈可访问的对象）。
更多内存利用意味着垃圾回收的效率会降低。
如果您的JVM使用的虚拟内存超过了物理内存，那么操作系统通常必须将页面从RAM交换到磁盘，然后再交换回来。这会影响性能，尤其是在垃圾回收期间。

也有可能存在一些可以归因于您的云平台的影响。例如，如果您在计算节点上的虚拟服务器上运行，而该节点有许多虚拟服务器，您可能无法获得完整的每个虚拟CPU核心的计算资源。如果您的虚拟服务器生成了大量的交换流量，那么这很可能会进一步降低服务器获得的CPU资源份额。

我们无法确定实际导致问题的原因，但如果我是您，我会查看Java GC日志，并使用诸如 vmstat 和 iostat 这样的操作系统工具，以寻找过多分页和过多I/O的迹象。

英文:

> Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

Clearly streams do consume CPU. And generally speaking, code implemented using non-parallel streams does run a bit slower than code implemented using old-fashioned loops. However, the difference in performance is not huge. (Maybe 5 or 10%?)

In general, a stream does not generate more garbage than an old-fashioned loop performing the same computation. For instance if we compared your example with a loop doing the same thing (i.e. generating a new list), then I would expect there to be a 1-to-1 correspondence between the memory allocations for the two versions.

In short, I don't think streams are directly implicated in this. Obviously, if your service is processing a lot of lists (using streams or loops) for each request, then that is going to affect the TPS. And even more so if the lists are actually fetched from your backend database. But that's normal too. This could be addressed by doing things like request caching, and tweaking the granularity of API requests to compute expensive results that the caller doesn't actually need.

(I would NOT recommend adding parallel() to your streams in your scenario. Since your service are already compute (or swap) bound, there are no "spare" cycles to run the streams in parallel. Using parallel() here is likely to reduce your TPS.)

The second part of your question is about performance (TPS) versus the thread count versus (we think) VCPUs. It is not possible to interpret the results you have given because you don't explain the units of measurements, and .... because I suspect that there other factors in play.

However, as a general rule:

Adding more threads when an application is compute intensive doesn't help.
More threads means more memory utilization (thread stacks + objects only reachable from thread stacks).
More memory utilization means the GC will be less ergonomic.
If your JVM is using more virtual memory than you have physical memory, then the OS will typically have to swap pages from RAM to disk and back. This impacts on performance, especially during garbage collection.

It is also possible that there are effects that can be attributed to your cloud platform. For example, if your are running in a virtual server on a compute node with lots of virtual servers, you many not get a full CPU's worth per VCPU. And if your virtual server is generating a lot of swap traffic, that will most likely reduce your server's share of the CPU resources even further.

We cannot say what is actually causing your problem, but if I was in your shoes I would be looking at the Java GC logs, and using OS tools like vmstat and iostat to look for signs of excessive paging and excessive I/O in general.

答案2

得分: 0

是因为流（Streams）消耗更多 CPU 吗？

我猜你的意思是：相较于循环，流是否消耗更多的 CPU 资源？ 
看起来差别并不大，如果循环和流在执行相同的操作时。 
根据具体情况可能会有一些小差异。这里有另外两篇关于这个问题的文章（得出了这个结论）： 
https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html
https://dzone.com/articles/java-performance-for-looping-vs-streaming

还是因为每次流的迭代之后，内部内存可能会被垃圾回收，从而导致高垃圾回收？

根据你的代码片段，无法回答这个问题。我没有看到是否有一些对象不再被引用，从而与垃圾回收有关。 
在这个问题中，解释了是什么触发了垃圾回收： 
https://stackoverflow.com/questions/17483415/what-triggers-garbage-collection 
但是你的问题没有提供关于内存使用的信息。

如果你想要调优你的代码，使用并行流可能是一个选择：

products.stream().parallel()  
   .filter(Objects::nonNull)
   ...

你会找到一些文章（例如参考我给出的第一个链接），得出结论：在某些情况下，并行流确实更快。因此你可以尝试这个方法来提升性能。

英文:

> Is it because Streams consume more CPU?

I assume you mean: Consume Streams more CPU than loops? 
It seems there is not much difference, if loop and stream are doing the same things. 
Depending on the exact case there might be small differences. Here are 2 other articles about this question (with this outcome): 
https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html
https://dzone.com/articles/java-performance-for-looping-vs-streaming

> Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?

Based on your code snippet this question can not be answered. I don't see if some objects are not referenced anymore, so that garbage collection has something to do. 
In this question it is explained what triggers the garbage collection: 
https://stackoverflow.com/questions/17483415/what-triggers-garbage-collection 
But your question contains no information about memory usage.

If you want to tune your code, using parallel streams could be an option:

products.stream().parallel()  
   .filter(Objects::nonNull)
   ...

You will find several articles (e.g. see my first link), which come to conclusion, that there are scenarios, where parallel streams are really faster. So you could try this to increase performance.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Java 8连续流（sequential streams）会使CPU使用率非常高。

问题

答案1

答案2

在Spring Boot应用程序中定期创建的txt文件的存储位置和方式是什么？

如何创建拦截器以将 HTTP 标头转换为自定义 DTO？

将循环转换为Java Streams

在Java中查找大型数据数组中的特定元素？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论