Elasticsearch: 在 java.lang.OutOfMemoryError: Java heap space 后重新启动节点

huangapple go评论111阅读模式
英文:

Elasticsearch: restart node after java.lang.OutOfMemoryError: Java heap space

问题

我的一个ES节点由于java.lang.OutOfMemoryError: Java heap space错误而失败。以下是来自日志的完整堆栈跟踪:

[2020-09-18T04:25:04,215][WARN ][o.e.a.b.TransportShardBulkAction] [search1] [[my_index_4][0]] failed to perform indices:data/write/bulk
展开收缩
on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw] org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk
展开收缩
[r]] disconnected ... java.lang.OutOfMemoryError: Java heap space at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource$GlobalOrdinalValuesSource.<init>(CompositeValuesSource.java:137) ~[elasticsearch-6.2.4.jar:6.2.4] ...

因为上述异常,当我访问任何ES API时,我遇到了master_not_discovered_exception

问题:有人可以告诉我应该执行的下一步操作,以使Elasticsearch恢复正常状态吗?是否有方法可以重新启动断开的节点?

英文:

One of my ES nodes has failed because of java.lang.OutOfMemoryError: Java heap space error. Here is the full stack trace from the logs:

    [2020-09-18T04:25:04,215][WARN ][o.e.a.b.TransportShardBulkAction] [search1] [[my_index_4][0]] failed to perform indices:data/write/bulk
展开收缩
on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw] org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk
展开收缩
[r]] disconnected [2020-09-18T04:25:04,215][WARN ][o.e.c.a.s.ShardStateAction] [search1] [my_index_4][0] received shard failed for shard id [[my_index_4][0]], allocation id [BUpviwHxQK2qC3GrELC2Hw], primary term [2], message [failed to perform indices:data/write/bulk
展开收缩
on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]], failure [NodeDisconnectedException[[search3][X.X.X.179:9300][indices:data/write/bulk
展开收缩
[r]] disconnected]] org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk
展开收缩
[r]] disconnected [2020-09-18T04:25:04,215][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [search1] failed to execute on node [cm_76wfGRFm9nbPR1mJxTQ] org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][cluster:monitor/nodes/info[n]] disconnected [2020-09-18T04:25:04,219][INFO ][o.e.c.r.a.AllocationService] [search1] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[my_index_4][0]] ...]). [2020-09-18T04:25:05,450][INFO ][o.e.m.j.JvmGcMonitorService] [search1] [gc][11099506] overhead, spent [605ms] collecting in the last [1.4s] [2020-09-18T04:25:05,453][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [search1] fatal error in thread [elasticsearch[search1][search][T#5]], exiting java.lang.OutOfMemoryError: Java heap space at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource$GlobalOrdinalValuesSource.&lt;init&gt;(CompositeValuesSource.java:137) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource.wrapGlobalOrdinals(CompositeValuesSource.java:123) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesComparator.&lt;init&gt;(CompositeValuesComparator.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregator.&lt;init&gt;(CompositeAggregator.java:69) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationFactory.createInternal(CompositeAggregationFactory.java:52) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:216) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:216) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:55) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:105) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$14(IndicesService.java:1133) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.IndicesService$$Lambda$2241/341562582.accept(Unknown Source) ~[?:?] at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$15(IndicesService.java:1186) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.IndicesService$$Lambda$2242/1286052129.get(Unknown Source) ~[?:?] at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:412) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1192) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1132) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:305) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:340) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:316) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:312) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1002) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_171] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

Because of the exception above, I am getting master_not_discovered_exception when I am hitting any of ES APIs.

Question: Can anyone tell me the next steps that I should perform to put Elasticsearch back to normal state? Is there a way to restart disconnected node?

答案1

得分: 2

首先让我简要解释一下可能导致这个问题的原因:

  1. 如日志中所提到的,您似乎在运行昂贵的聚合操作,这通常会占用大量内存,并且已知会消耗大量内存,您的垃圾回收(GC)无法回收这些内存,最终导致您的应用程序(ES)内存耗尽并被终止。
  2. 除了日志中显示的昂贵聚合之外,高内存消耗也可能是由于大量的搜索和索引请求引起的,请查看此节点的搜索和索引慢日志,详细信息请参见ES慢日志

现在谈谈解决方案部分

此ES节点已停止运行,这导致了master_not_discovered_exception,因此重要的是重新启动此节点,然后查看是否消除了此异常。

预防内存溢出异常

  1. 您应该正确地配置ES中的断路器,如果可能的话,升级到具有基于实际内存的更好断路器的ES 7.X
  2. 改进ES的索引和搜索性能。
英文:

First let me briefly explains what might have caused this issue:

  1. As mentioned in the logs, you seems to be running costly aggregation, which are in general memory intensive and known to consume a lot of memory, which your Garbage collection(GC) was not able to reclaim, and eventually your application(ES) ran out of memory and got killed.
  2. Apart from costly aggregations which is shown in the logs, high memory consumption can also be caused by heavy searches and indexing request, so please have a look at this node's both search and index slow logs, refer ES slow logs for more info

Now coming to resolution part

This ES node is dead, which is causing master_not_discovered_exception hence its important to bring restart this node again and see if this exception goes.

Prevention of OOM exception

  1. You should properly configure the circuit breaker available in ES and if possible upgrade to ES 7.X which has better circuit breakers based on real-memory
  2. Improve ES indexing and search performance.

答案2

得分: 0

`java.lang.OutOfMemoryError: Java heap space` 是由于运行复合聚合查询导致的,我将`size`参数设置为`Integer.MAX_VALUE`:

    {
        "size": 0,
    
        "aggregations": {
            "myParam.keyword": {
                "composite": {
                    "size": 2147483647,
                    "sources": [
                        {
                            "myParam.keyword": {
                                "terms": {
                                    "field": "myParam.keyword",
                                    "order": "asc"
                                }
                            }
                        }
                    ]
                }
            }
        }
    } 

根据堆栈跟踪,错误发生在聚合值数组的初始化时,位于`CompositeValuesSource.java:137`:

    GlobalOrdinalValuesSource(ValuesSource.Bytes.WithOrdinals vs, int size, int reverseMul) {
        super(vs, size, reverseMul);
        this.values = new long[size];
    }
在这里,`size`参数来自查询。

答案 https://stackoverflow.com/a/63965634/5284890 确认了根本原因。

我的下一步是使用以下命令停止并重新运行Elasticsearch:

    sudo systemctl stop elasticsearch.service
    sudo systemctl start elasticsearch.service

接下来,我将查看ES文章中提到的建议断路器,该答案在链接 https://stackoverflow.com/a/63965634/5284890 中提到。
英文:

The java.lang.OutOfMemoryError: Java heap space was caused by running the composite aggregation query for which I set the size parameter to Integer.MAX_VALUE:

{
    &quot;size&quot;: 0,

    &quot;aggregations&quot;: {
        &quot;myParam.keyword&quot;: {
            &quot;composite&quot;: {
                &quot;size&quot;: 2147483647,
                &quot;sources&quot;: [
                    {
                        &quot;myParam.keyword&quot;: {
                            &quot;terms&quot;: {
                                &quot;field&quot;: &quot;myParam.keyword&quot;,
                                &quot;order&quot;: &quot;asc&quot;
                            }
                        }
                    }
                ]
            }
        }
    }
} 

According to stack trace, the error occurred while initialization of aggregation values array CompositeValuesSource.java:137:

GlobalOrdinalValuesSource(ValuesSource.Bytes.WithOrdinals vs, int size, int reverseMul) {
    super(vs, size, reverseMul);
    this.values = new long[size];
}

Here, the size parameter is coming from the query.

The answer https://stackoverflow.com/a/63965634/5284890 confirms the root cause.

My next step was stopping and running Elasticsearcch again using the following commands

sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service

My following steps will be to check suggested circuit breaker in ES article mentioned in this answer https://stackoverflow.com/a/63965634/5284890.

huangapple
  • 本文由 发表于 2020年9月18日 17:39:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/63953220.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定