如何解决Ignite性能问题?

huangapple go评论83阅读模式
英文:

How to fix Ignite performance issues?

问题

我们在服务器和客户端模式下使用了 Ignite 2.7.6:两个服务器和六个客户端。

起初,每个内部带有客户端 Ignite 的应用节点具有 2G 堆内存。每个 Ignite 服务器节点具有 24G 的堆外内存和 2G 的堆内存。

在上次应用程序更新中,我们引入了新功能,需要大约 2000 个具有 20 个条目(用户组)的缓存。缓存条目的大小很小,最多有 10 个整数。这些缓存是通过 ignite.getOrCreateCache(name) 方法创建的,因此它们具有默认的缓存配置(堆外内存,分区)。

但是,在更新后的一个小时内,我们在服务器节点上遇到了 OOM(内存不足)错误:

  1. [00:59:55,628][SEVERE][sys-#44759][GridDhtPartitionsExchangeFuture] Failed to notify listener: o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2@3287dcbd
  2. java.lang.OutOfMemoryError: Java heap space

现在,Ignite 服务器节点的堆内存已增加到 16G,应用节点的堆内存已增加到 12G。

正如我们所看到的,所有服务器节点的 CPU 负载现在都很高,约为 250%(更新前为 20%),G1 Young Gen 的暂停时间长达 5 毫秒(更新前为 300 微秒)。

服务器配置如下:

  1. <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
  2. <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
  3. <property name="workDirectory" value="/opt/qwerty/ignite/data"/>
  4. <property name="gridLogger">
  5. <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
  6. <constructor-arg type="java.lang.String" value="config/ignite-log4j2.xml"/>
  7. </bean>
  8. </property>
  9. <property name="dataStorageConfiguration">
  10. <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
  11. <property name="defaultDataRegionConfiguration">
  12. <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
  13. <property name="maxSize" value="#{24L * 1024 * 1024 * 1024}"/>
  14. <property name="pageEvictionMode" value="RANDOM_LRU"/>
  15. </bean>
  16. </property>
  17. </bean>
  18. </property>
  19. <property name="discoverySpi">
  20. <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
  21. <property name="localAddress" value="host-1.qwerty.srv"/>
  22. <property name="ipFinder">
  23. <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
  24. <property name="addresses">
  25. <list>
  26. <value>host-1.qwerty.srv:47500</value>
  27. <value>host-2.qwerty.srv:47500</value>
  28. </list>
  29. </property>
  30. </bean>
  31. </property>
  32. </bean>
  33. </property>
  34. <property name="communicationSpi">
  35. <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
  36. <property name="localAddress" value="host-1.qwerty.srv"/>
  37. </bean>
  38. </property>
  39. </bean>
  40. </beans>

在 Ignite 服务器节点的内存转储中,我们看到了许多 21Mb 的 org.apache.ignite.internal.marshaller.optimized.OptimizedObjectStreamRegistry$StreamHolder

内存泄漏报告显示:

  1. 问题嫌疑人 1
  2. "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100" 加载的 "org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager" 的一个实例占用 529,414,77610.39%)字节。内存积累在由 "<system class loader>" 加载的 "java.util.LinkedList" 的一个实例中。
  3. 关键字
  4. jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
  5. java.util.LinkedList
  6. org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager
  7. 问题嫌疑人 2
  8. "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100" 加载的 "org.apache.ignite.thread.IgniteThread" 384 个实例占用 3,023,380,00059.34%)字节。
  9. 关键字
  10. org.apache.ignite.thread.IgniteThread
  11. jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
  12. 问题嫌疑人 3
  13. "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100" 加载的 "org.apache.ignite.internal.processors.cache.CacheGroupContext" 1,023 个实例占用 905,077,82417.76%)字节。
  14. 关键字
  15. jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
  16. org.apache.ignite.internal.processors.cache.CacheGroupContext

问题是我们做错了什么?我们可以调整什么?也许问题出在我们的代码中,但如何确定问题出在哪里?

英文:

We use Ignite 2.7.6 in both server and client modes: two server and six clients.

At first, each app node with client Ignite inside had 2G heap. Each Ignite server node had 24G offheap and 2G heap.

With last app update we introduced new functionality which required about 2000 caches of 20 entires (user groups). Cache entry has small size up to 10 integers inside.
These caches are created via ignite.getOrCreateCache(name) method, so they have default cache configurations (off-heap, partitioned).

But in an hour after update we got OOM error on a server node:

  1. [00:59:55,628][SEVERE][sys-#44759][GridDhtPartitionsExchangeFuture] Failed to notify listener: o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2@3287dcbd
  2. java.lang.OutOfMemoryError: Java heap space

Heaps are increased now to 16G on Ignite server nodes and to 12G on app nodes.

As we can see, all server nodes have high CPU load about 250% now (20% before update) and long G1 Young Gen pauses up to 5 millisecond (300 microseconds before update).

Server config is:

  1. &lt;beans xmlns=&quot;http://www.springframework.org/schema/beans&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot; http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd&quot;&gt;
  2. &lt;bean id=&quot;grid.cfg&quot; class=&quot;org.apache.ignite.configuration.IgniteConfiguration&quot;&gt;
  3. &lt;property name=&quot;workDirectory&quot; value=&quot;/opt/qwerty/ignite/data&quot;/&gt;
  4. &lt;property name=&quot;gridLogger&quot;&gt;
  5. &lt;bean class=&quot;org.apache.ignite.logger.log4j2.Log4J2Logger&quot;&gt;
  6. &lt;constructor-arg type=&quot;java.lang.String&quot; value=&quot;config/ignite-log4j2.xml&quot;/&gt;
  7. &lt;/bean&gt;
  8. &lt;/property&gt;
  9. &lt;property name=&quot;dataStorageConfiguration&quot;&gt;
  10. &lt;bean class=&quot;org.apache.ignite.configuration.DataStorageConfiguration&quot;&gt;
  11. &lt;property name=&quot;defaultDataRegionConfiguration&quot;&gt;
  12. &lt;bean class=&quot;org.apache.ignite.configuration.DataRegionConfiguration&quot;&gt;
  13. &lt;property name=&quot;maxSize&quot; value=&quot;#{24L * 1024 * 1024 * 1024}&quot;/&gt;
  14. &lt;property name=&quot;pageEvictionMode&quot; value=&quot;RANDOM_LRU&quot;/&gt;
  15. &lt;/bean&gt;
  16. &lt;/property&gt;
  17. &lt;/bean&gt;
  18. &lt;/property&gt;
  19. &lt;property name=&quot;discoverySpi&quot;&gt;
  20. &lt;bean class=&quot;org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi&quot;&gt;
  21. &lt;property name=&quot;localAddress&quot; value=&quot;host-1.qwerty.srv&quot;/&gt;
  22. &lt;property name=&quot;ipFinder&quot;&gt;
  23. &lt;bean class=&quot;org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder&quot;&gt;
  24. &lt;property name=&quot;addresses&quot;&gt;
  25. &lt;list&gt;
  26. &lt;value&gt;host-1.qwerty.srv:47500&lt;/value&gt;
  27. &lt;value&gt;host-2.qwerty.srv:47500&lt;/value&gt;
  28. &lt;/list&gt;
  29. &lt;/property&gt;
  30. &lt;/bean&gt;
  31. &lt;/property&gt;
  32. &lt;/bean&gt;
  33. &lt;/property&gt;
  34. &lt;property name=&quot;communicationSpi&quot;&gt;
  35. &lt;bean class=&quot;org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi&quot;&gt;
  36. &lt;property name=&quot;localAddress&quot; value=&quot;host-1.qwerty.srv&quot;/&gt;
  37. &lt;/bean&gt;
  38. &lt;/property&gt;
  39. &lt;/bean&gt;
  40. &lt;/beans&gt;

In memory dump of an Ignite server node we see a lot of org.apache.ignite.internal.marshaller.optimized.OptimizedObjectStreamRegistry$StreamHolder of 21Mb

Memory leak report shows:

  1. Problem Suspect 1
  2. One instance of &quot;org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager&quot; loaded by &quot;jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100&quot; occupies 529 414 776 (10,39 %) bytes. The memory is accumulated in one instance of &quot;java.util.LinkedList&quot; loaded by &quot;&lt;system class loader&gt;&quot;.
  3. Keywords
  4. jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
  5. java.util.LinkedList
  6. org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager
  7. Problem Suspect 2
  8. 384 instances of &quot;org.apache.ignite.thread.IgniteThread&quot;, loaded by &quot;jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100&quot; occupy 3 023 380 000 (59,34 %) bytes.
  9. Keywords
  10. org.apache.ignite.thread.IgniteThread
  11. jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
  12. Problem Suspect 3
  13. 1 023 instances of &quot;org.apache.ignite.internal.processors.cache.CacheGroupContext&quot;, loaded by &quot;jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100&quot; occupy 905 077 824 (17,76 %) bytes.
  14. Keywords
  15. jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
  16. org.apache.ignite.internal.processors.cache.CacheGroupContext

The question is what's wrong we have done? What can we tune? Maybe the problem in our code, but how to identify where it is?

答案1

得分: 2

2000个缓存太多了。一个缓存的数据结构可能占用高达40M的空间。

我建议至少为所有相似目的和组合的缓存使用相同的“cacheGroup”,以共享其中一些数据结构。

英文:

2000 caches is a lot. One cache probably takes up to 40M in data structures.

I recommend at least using the same cacheGroup for all caches of the similar purpose and composition, to share some of these data structures.

huangapple
  • 本文由 发表于 2020年9月17日 17:16:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/63934924.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定