英文:
Elasticsearch: restart node after java.lang.OutOfMemoryError: Java heap space
问题
我的一个ES节点由于java.lang.OutOfMemoryError: Java heap space
错误而失败。以下是来自日志的完整堆栈跟踪:
[2020-09-18T04:25:04,215][WARN ][o.e.a.b.TransportShardBulkAction] [search1] [[my_index_4][0]] failed to perform indices:data/write/bulk展开收缩 on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected
...
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource$GlobalOrdinalValuesSource.<init>(CompositeValuesSource.java:137) ~[elasticsearch-6.2.4.jar:6.2.4]
...
因为上述异常,当我访问任何ES API时,我遇到了master_not_discovered_exception
。
问题:有人可以告诉我应该执行的下一步操作,以使Elasticsearch恢复正常状态吗?是否有方法可以重新启动断开的节点?
英文:
One of my ES nodes has failed because of java.lang.OutOfMemoryError: Java heap space
error. Here is the full stack trace from the logs:
[2020-09-18T04:25:04,215][WARN ][o.e.a.b.TransportShardBulkAction] [search1] [[my_index_4][0]] failed to perform indices:data/write/bulk展开收缩 on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected
[2020-09-18T04:25:04,215][WARN ][o.e.c.a.s.ShardStateAction] [search1] [my_index_4][0] received shard failed for shard id [[my_index_4][0]], allocation id [BUpviwHxQK2qC3GrELC2Hw], primary term [2], message [failed to perform indices:data/write/bulk展开收缩 on replica [my_index_4][0], node[cm_76wfGRFm9nbPR1mJxTQ], [R], s[STARTED], a[id=BUpviwHxQK2qC3GrELC2Hw]], failure [NodeDisconnectedException[[search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected]]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][indices:data/write/bulk展开收缩[r]] disconnected
[2020-09-18T04:25:04,215][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [search1] failed to execute on node [cm_76wfGRFm9nbPR1mJxTQ]
org.elasticsearch.transport.NodeDisconnectedException: [search3][X.X.X.179:9300][cluster:monitor/nodes/info[n]] disconnected
[2020-09-18T04:25:04,219][INFO ][o.e.c.r.a.AllocationService] [search1] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[my_index_4][0]] ...]).
[2020-09-18T04:25:05,450][INFO ][o.e.m.j.JvmGcMonitorService] [search1] [gc][11099506] overhead, spent [605ms] collecting in the last [1.4s]
[2020-09-18T04:25:05,453][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [search1] fatal error in thread [elasticsearch[search1][search][T#5]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource$GlobalOrdinalValuesSource.<init>(CompositeValuesSource.java:137) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesSource.wrapGlobalOrdinals(CompositeValuesSource.java:123) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesComparator.<init>(CompositeValuesComparator.java:50) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregator.<init>(CompositeAggregator.java:69) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationFactory.createInternal(CompositeAggregationFactory.java:52) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:216) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:216) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:55) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:105) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$14(IndicesService.java:1133) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService$$Lambda$2241/341562582.accept(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$15(IndicesService.java:1186) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService$$Lambda$2242/1286052129.get(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:412) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1192) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1132) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:305) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:340) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:316) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:312) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1002) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Because of the exception above, I am getting master_not_discovered_exception
when I am hitting any of ES APIs.
Question: Can anyone tell me the next steps that I should perform to put Elasticsearch back to normal state? Is there a way to restart disconnected node?
答案1
得分: 2
首先让我简要解释一下可能导致这个问题的原因:
- 如日志中所提到的,您似乎在运行昂贵的聚合操作,这通常会占用大量内存,并且已知会消耗大量内存,您的垃圾回收(GC)无法回收这些内存,最终导致您的应用程序(ES)内存耗尽并被终止。
- 除了日志中显示的昂贵聚合之外,高内存消耗也可能是由于大量的搜索和索引请求引起的,请查看此节点的搜索和索引慢日志,详细信息请参见ES慢日志
现在谈谈解决方案部分
此ES节点已停止运行,这导致了master_not_discovered_exception
,因此重要的是重新启动此节点,然后查看是否消除了此异常。
预防内存溢出异常
- 您应该正确地配置ES中的断路器,如果可能的话,升级到具有基于实际内存的更好断路器的ES 7.X。
- 改进ES的索引和搜索性能。
英文:
First let me briefly explains what might have caused this issue:
- As mentioned in the logs, you seems to be running costly aggregation, which are in general memory intensive and known to consume a lot of memory, which your Garbage collection(GC) was not able to reclaim, and eventually your application(ES) ran out of memory and got killed.
- Apart from costly aggregations which is shown in the logs, high memory consumption can also be caused by heavy searches and indexing request, so please have a look at this node's both search and index slow logs, refer ES slow logs for more info
Now coming to resolution part
This ES node is dead, which is causing master_not_discovered_exception
hence its important to bring restart this node again and see if this exception goes.
Prevention of OOM exception
- You should properly configure the circuit breaker available in ES and if possible upgrade to ES 7.X which has better circuit breakers based on real-memory
- Improve ES indexing and search performance.
答案2
得分: 0
`java.lang.OutOfMemoryError: Java heap space` 是由于运行复合聚合查询导致的,我将`size`参数设置为`Integer.MAX_VALUE`:
{
"size": 0,
"aggregations": {
"myParam.keyword": {
"composite": {
"size": 2147483647,
"sources": [
{
"myParam.keyword": {
"terms": {
"field": "myParam.keyword",
"order": "asc"
}
}
}
]
}
}
}
}
根据堆栈跟踪,错误发生在聚合值数组的初始化时,位于`CompositeValuesSource.java:137`:
GlobalOrdinalValuesSource(ValuesSource.Bytes.WithOrdinals vs, int size, int reverseMul) {
super(vs, size, reverseMul);
this.values = new long[size];
}
在这里,`size`参数来自查询。
答案 https://stackoverflow.com/a/63965634/5284890 确认了根本原因。
我的下一步是使用以下命令停止并重新运行Elasticsearch:
sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service
接下来,我将查看ES文章中提到的建议断路器,该答案在链接 https://stackoverflow.com/a/63965634/5284890 中提到。
英文:
The java.lang.OutOfMemoryError: Java heap space
was caused by running the composite aggregation query for which I set the size
parameter to Integer.MAX_VALUE
:
{
"size": 0,
"aggregations": {
"myParam.keyword": {
"composite": {
"size": 2147483647,
"sources": [
{
"myParam.keyword": {
"terms": {
"field": "myParam.keyword",
"order": "asc"
}
}
}
]
}
}
}
}
According to stack trace, the error occurred while initialization of aggregation values array CompositeValuesSource.java:137
:
GlobalOrdinalValuesSource(ValuesSource.Bytes.WithOrdinals vs, int size, int reverseMul) {
super(vs, size, reverseMul);
this.values = new long[size];
}
Here, the size
parameter is coming from the query.
The answer https://stackoverflow.com/a/63965634/5284890 confirms the root cause.
My next step was stopping and running Elasticsearcch again using the following commands
sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service
My following steps will be to check suggested circuit breaker in ES article mentioned in this answer https://stackoverflow.com/a/63965634/5284890.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论