2020年9月17日 17:16:12go评论83阅读模式

英文:

How to fix Ignite performance issues?

问题

我们在服务器和客户端模式下使用了 Ignite 2.7.6：两个服务器和六个客户端。

起初，每个内部带有客户端 Ignite 的应用节点具有 2G 堆内存。每个 Ignite 服务器节点具有 24G 的堆外内存和 2G 的堆内存。

在上次应用程序更新中，我们引入了新功能，需要大约 2000 个具有 20 个条目（用户组）的缓存。缓存条目的大小很小，最多有 10 个整数。这些缓存是通过 ignite.getOrCreateCache(name) 方法创建的，因此它们具有默认的缓存配置（堆外内存，分区）。

但是，在更新后的一个小时内，我们在服务器节点上遇到了 OOM（内存不足）错误：

[00:59:55,628][SEVERE][sys-#44759][GridDhtPartitionsExchangeFuture] Failed to notify listener: o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2@3287dcbd
java.lang.OutOfMemoryError: Java heap space

现在，Ignite 服务器节点的堆内存已增加到 16G，应用节点的堆内存已增加到 12G。

正如我们所看到的，所有服务器节点的 CPU 负载现在都很高，约为 250%（更新前为 20%），G1 Young Gen 的暂停时间长达 5 毫秒（更新前为 300 微秒）。

服务器配置如下：

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
  <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="workDirectory" value="/opt/qwerty/ignite/data"/>
    <property name="gridLogger">
      <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
        <constructor-arg type="java.lang.String" value="config/ignite-log4j2.xml"/>
      </bean>
    </property>
    <property name="dataStorageConfiguration">
      <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
        <property name="defaultDataRegionConfiguration">
          <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
            <property name="maxSize" value="#{24L * 1024 * 1024 * 1024}"/>
            <property name="pageEvictionMode" value="RANDOM_LRU"/>
          </bean>
        </property>
      </bean>
    </property>
    <property name="discoverySpi">
      <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
        <property name="localAddress" value="host-1.qwerty.srv"/>
        <property name="ipFinder">
          <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
            <property name="addresses">
              <list>
                <value>host-1.qwerty.srv:47500</value>
                <value>host-2.qwerty.srv:47500</value>
              </list>
            </property>
          </bean>
        </property>
      </bean>
    </property>
    <property name="communicationSpi">
      <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
        <property name="localAddress" value="host-1.qwerty.srv"/>
      </bean>
    </property>
  </bean>
</beans>

在 Ignite 服务器节点的内存转储中，我们看到了许多 21Mb 的 org.apache.ignite.internal.marshaller.optimized.OptimizedObjectStreamRegistry$StreamHolder。

内存泄漏报告显示：

问题嫌疑人 1
由 "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100" 加载的 "org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager" 的一个实例占用 529,414,776（10.39%）字节。内存积累在由 "<system class loader>" 加载的 "java.util.LinkedList" 的一个实例中。
关键字
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
java.util.LinkedList
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager
问题嫌疑人 2
由 "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100" 加载的 "org.apache.ignite.thread.IgniteThread" 的 384 个实例占用 3,023,380,000（59.34%）字节。
关键字
org.apache.ignite.thread.IgniteThread
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
问题嫌疑人 3
由 "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100" 加载的 "org.apache.ignite.internal.processors.cache.CacheGroupContext" 的 1,023 个实例占用 905,077,824（17.76%）字节。
关键字
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
org.apache.ignite.internal.processors.cache.CacheGroupContext

问题是我们做错了什么？我们可以调整什么？也许问题出在我们的代码中，但如何确定问题出在哪里？

英文:

We use Ignite 2.7.6 in both server and client modes: two server and six clients.

At first, each app node with client Ignite inside had 2G heap. Each Ignite server node had 24G offheap and 2G heap.

With last app update we introduced new functionality which required about 2000 caches of 20 entires (user groups). Cache entry has small size up to 10 integers inside.
These caches are created via ignite.getOrCreateCache(name) method, so they have default cache configurations (off-heap, partitioned).

But in an hour after update we got OOM error on a server node:

[00:59:55,628][SEVERE][sys-#44759][GridDhtPartitionsExchangeFuture] Failed to notify listener: o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2@3287dcbd
java.lang.OutOfMemoryError: Java heap space

Heaps are increased now to 16G on Ignite server nodes and to 12G on app nodes.

As we can see, all server nodes have high CPU load about 250% now (20% before update) and long G1 Young Gen pauses up to 5 millisecond (300 microseconds before update).

Server config is:

&lt;beans xmlns=&quot;http://www.springframework.org/schema/beans&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot; http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd&quot;&gt;
  &lt;bean id=&quot;grid.cfg&quot; class=&quot;org.apache.ignite.configuration.IgniteConfiguration&quot;&gt;
    &lt;property name=&quot;workDirectory&quot; value=&quot;/opt/qwerty/ignite/data&quot;/&gt;
    &lt;property name=&quot;gridLogger&quot;&gt;
      &lt;bean class=&quot;org.apache.ignite.logger.log4j2.Log4J2Logger&quot;&gt;
        &lt;constructor-arg type=&quot;java.lang.String&quot; value=&quot;config/ignite-log4j2.xml&quot;/&gt;
      &lt;/bean&gt;
    &lt;/property&gt;
    &lt;property name=&quot;dataStorageConfiguration&quot;&gt;
      &lt;bean class=&quot;org.apache.ignite.configuration.DataStorageConfiguration&quot;&gt;
        &lt;property name=&quot;defaultDataRegionConfiguration&quot;&gt;
          &lt;bean class=&quot;org.apache.ignite.configuration.DataRegionConfiguration&quot;&gt;
            &lt;property name=&quot;maxSize&quot; value=&quot;#{24L * 1024 * 1024 * 1024}&quot;/&gt;
            &lt;property name=&quot;pageEvictionMode&quot; value=&quot;RANDOM_LRU&quot;/&gt;
          &lt;/bean&gt;
        &lt;/property&gt;
      &lt;/bean&gt;
    &lt;/property&gt;
    &lt;property name=&quot;discoverySpi&quot;&gt;
      &lt;bean class=&quot;org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi&quot;&gt;
        &lt;property name=&quot;localAddress&quot; value=&quot;host-1.qwerty.srv&quot;/&gt;
        &lt;property name=&quot;ipFinder&quot;&gt;
          &lt;bean class=&quot;org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder&quot;&gt;
            &lt;property name=&quot;addresses&quot;&gt;
              &lt;list&gt;
                &lt;value&gt;host-1.qwerty.srv:47500&lt;/value&gt;
                &lt;value&gt;host-2.qwerty.srv:47500&lt;/value&gt;
              &lt;/list&gt;
            &lt;/property&gt;
          &lt;/bean&gt;
        &lt;/property&gt;
      &lt;/bean&gt;
    &lt;/property&gt;
    &lt;property name=&quot;communicationSpi&quot;&gt;
      &lt;bean class=&quot;org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi&quot;&gt;
        &lt;property name=&quot;localAddress&quot; value=&quot;host-1.qwerty.srv&quot;/&gt;
      &lt;/bean&gt;
    &lt;/property&gt;
  &lt;/bean&gt;
&lt;/beans&gt;

In memory dump of an Ignite server node we see a lot of org.apache.ignite.internal.marshaller.optimized.OptimizedObjectStreamRegistry$StreamHolder of 21Mb

Memory leak report shows:

Problem Suspect 1
One instance of &quot;org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager&quot; loaded by &quot;jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100&quot; occupies 529 414 776 (10,39 %) bytes. The memory is accumulated in one instance of &quot;java.util.LinkedList&quot; loaded by &quot;&lt;system class loader&gt;&quot;.
Keywords
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
java.util.LinkedList
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager
Problem Suspect 2
384 instances of &quot;org.apache.ignite.thread.IgniteThread&quot;, loaded by &quot;jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100&quot; occupy 3 023 380 000 (59,34 %) bytes. 
Keywords
org.apache.ignite.thread.IgniteThread
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
Problem Suspect 3
1 023 instances of &quot;org.apache.ignite.internal.processors.cache.CacheGroupContext&quot;, loaded by &quot;jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100&quot; occupy 905 077 824 (17,76 %) bytes. 
Keywords
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x400000100
org.apache.ignite.internal.processors.cache.CacheGroupContext

The question is what's wrong we have done? What can we tune? Maybe the problem in our code, but how to identify where it is?

答案1

得分: 2

2000个缓存太多了。一个缓存的数据结构可能占用高达40M的空间。

我建议至少为所有相似目的和组合的缓存使用相同的“cacheGroup”，以共享其中一些数据结构。

英文:

2000 caches is a lot. One cache probably takes up to 40M in data structures.

I recommend at least using the same cacheGroup for all caches of the similar purpose and composition, to share some of these data structures.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何解决Ignite性能问题？

问题

答案1

如何创建一个仅在子类内部可调用的方法？

Mockito单元测试对方法进行存根会抛出NullPointerException。

JDBC. 替换所有行中列的值为：

如何使用iText 2.1.7在PDF中垂直创建矩形并添加文本：

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。