2020年9月18日 17:39:25go评论111阅读模式

英文:

Elasticsearch: restart node after java.lang.OutOfMemoryError: Java heap space

问题

我的一个ES节点由于java.lang.OutOfMemoryError: Java heap space错误而失败。以下是来自日志的完整堆栈跟踪：

[2020-09-18T04:25:04,215][WARN ][o.e.a.b.TransportShardBulkAction] [search1] [[my_index_4][0]] failed to perform indices:data/write/bulk展开收缩 on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected
...

java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource$GlobalOrdinalValuesSource.<init>(CompositeValuesSource.java:137) ~[elasticsearch-6.2.4.jar:6.2.4]
...

因为上述异常，当我访问任何ES API时，我遇到了master_not_discovered_exception。

问题：有人可以告诉我应该执行的下一步操作，以使Elasticsearch恢复正常状态吗？是否有方法可以重新启动断开的节点？

英文:

One of my ES nodes has failed because of java.lang.OutOfMemoryError: Java heap space error. Here is the full stack trace from the logs:

    [2020-09-18T04:25:04,215][WARN ][o.e.a.b.TransportShardBulkAction] [search1] [[my_index_4][0]] failed to perform indices:data/write/bulk展开收缩 on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected
[2020-09-18T04:25:04,215][WARN ][o.e.c.a.s.ShardStateAction] [search1] [my_index_4][0] received shard failed for shard id [[my_index_4][0]], allocation id [BUpviwHxQK2qC3GrELC2Hw], primary term [2], message [failed to perform indices:data/write/bulk展开收缩
 on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]], failure [NodeDisconnectedException[[search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected]]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected
[2020-09-18T04:25:04,215][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [search1] failed to execute on node [cm_76wfGRFm9nbPR1mJxTQ]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][cluster:monitor/nodes/info[n]] disconnected
[2020-09-18T04:25:04,219][INFO ][o.e.c.r.a.AllocationService] [search1] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[my_index_4][0]] ...]).
[2020-09-18T04:25:05,450][INFO ][o.e.m.j.JvmGcMonitorService] [search1] [gc][11099506] overhead, spent [605ms] collecting in the last [1.4s]
[2020-09-18T04:25:05,453][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [search1] fatal error in thread [elasticsearch[search1][search][T#5]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource$GlobalOrdinalValuesSource.&lt;init&gt;(CompositeValuesSource.java:137) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource.wrapGlobalOrdinals(CompositeValuesSource.java:123) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesComparator.&lt;init&gt;(CompositeValuesComparator.java:50) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregator.&lt;init&gt;(CompositeAggregator.java:69) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationFactory.createInternal(CompositeAggregationFactory.java:52) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:216) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:216) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:55) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:105) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$14(IndicesService.java:1133) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService$$Lambda$2241/341562582.accept(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$15(IndicesService.java:1186) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService$$Lambda$2242/1286052129.get(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:412) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1192) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1132) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:305) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:340) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:316) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:312) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1002) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

Because of the exception above, I am getting master_not_discovered_exception when I am hitting any of ES APIs.

Question: Can anyone tell me the next steps that I should perform to put Elasticsearch back to normal state? Is there a way to restart disconnected node?

答案1

得分: 2

首先让我简要解释一下可能导致这个问题的原因：

如日志中所提到的，您似乎在运行昂贵的聚合操作，这通常会占用大量内存，并且已知会消耗大量内存，您的垃圾回收（GC）无法回收这些内存，最终导致您的应用程序（ES）内存耗尽并被终止。
除了日志中显示的昂贵聚合之外，高内存消耗也可能是由于大量的搜索和索引请求引起的，请查看此节点的搜索和索引慢日志，详细信息请参见ES慢日志

现在谈谈解决方案部分

此ES节点已停止运行，这导致了master_not_discovered_exception，因此重要的是重新启动此节点，然后查看是否消除了此异常。

预防内存溢出异常

您应该正确地配置ES中的断路器，如果可能的话，升级到具有基于实际内存的更好断路器的ES 7.X。
改进ES的索引和搜索性能。

英文:

First let me briefly explains what might have caused this issue:

As mentioned in the logs, you seems to be running costly aggregation, which are in general memory intensive and known to consume a lot of memory, which your Garbage collection(GC) was not able to reclaim, and eventually your application(ES) ran out of memory and got killed.
Apart from costly aggregations which is shown in the logs, high memory consumption can also be caused by heavy searches and indexing request, so please have a look at this node's both search and index slow logs, refer ES slow logs for more info

Now coming to resolution part

This ES node is dead, which is causing master_not_discovered_exception hence its important to bring restart this node again and see if this exception goes.

Prevention of OOM exception

You should properly configure the circuit breaker available in ES and if possible upgrade to ES 7.X which has better circuit breakers based on real-memory
Improve ES indexing and search performance.

答案2

得分: 0

`java.lang.OutOfMemoryError: Java heap space` 是由于运行复合聚合查询导致的，我将`size`参数设置为`Integer.MAX_VALUE`：

    {
        "size": 0,
    
        "aggregations": {
            "myParam.keyword": {
                "composite": {
                    "size": 2147483647,
                    "sources": [
                        {
                            "myParam.keyword": {
                                "terms": {
                                    "field": "myParam.keyword",
                                    "order": "asc"
                                }
                            }
                        }
                    ]
                }
            }
        }
    } 

根据堆栈跟踪，错误发生在聚合值数组的初始化时，位于`CompositeValuesSource.java:137`：

    GlobalOrdinalValuesSource(ValuesSource.Bytes.WithOrdinals vs, int size, int reverseMul) {
        super(vs, size, reverseMul);
        this.values = new long[size];
    }
在这里，`size`参数来自查询。

答案 https://stackoverflow.com/a/63965634/5284890 确认了根本原因。

我的下一步是使用以下命令停止并重新运行Elasticsearch：

    sudo systemctl stop elasticsearch.service
    sudo systemctl start elasticsearch.service

接下来，我将查看ES文章中提到的建议断路器，该答案在链接 https://stackoverflow.com/a/63965634/5284890 中提到。

英文:

The java.lang.OutOfMemoryError: Java heap space was caused by running the composite aggregation query for which I set the size parameter to Integer.MAX_VALUE:

{
    &quot;size&quot;: 0,

    &quot;aggregations&quot;: {
        &quot;myParam.keyword&quot;: {
            &quot;composite&quot;: {
                &quot;size&quot;: 2147483647,
                &quot;sources&quot;: [
                    {
                        &quot;myParam.keyword&quot;: {
                            &quot;terms&quot;: {
                                &quot;field&quot;: &quot;myParam.keyword&quot;,
                                &quot;order&quot;: &quot;asc&quot;
                            }
                        }
                    }
                ]
            }
        }
    }
}

According to stack trace, the error occurred while initialization of aggregation values array CompositeValuesSource.java:137:

GlobalOrdinalValuesSource(ValuesSource.Bytes.WithOrdinals vs, int size, int reverseMul) {
    super(vs, size, reverseMul);
    this.values = new long[size];
}

Here, the size parameter is coming from the query.

The answer https://stackoverflow.com/a/63965634/5284890 confirms the root cause.

My next step was stopping and running Elasticsearcch again using the following commands

sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service

My following steps will be to check suggested circuit breaker in ES article mentioned in this answer https://stackoverflow.com/a/63965634/5284890.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Elasticsearch: 在 java.lang.OutOfMemoryError: Java heap space 后重新启动节点

问题

答案1

答案2

一个未分配分区的偏移量可以通过 KafkaConsumer.commitSync/commitAsync 提交吗？

Mockito的when和thenReturn语句在Java中对于2次方法调用不起作用。

使用java.util.Date在Java中进行不同的时间转换

Spring issue with @PostConstruct and @PreDestroy

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论