英文:
Elasticsearch : "failed to get node info for {IP}" and "noNodeAvailableException" in service log
问题
以下是翻译好的内容:
我面临一个之前没有遇到过的问题。
我附上了我的服务和 Elasticsearch(2.4.4)的日志:
2020-05-30 06:29:44.576 INFO 24787 --- [generic][T#287]] org.elasticsearch.client.transport : [Shatter] 无法获取节点 {#transport#-1}{172.17.0.1}{172.17.0.1:9300} 的信息,正在断开连接...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][172.17.0.1:9300][cluster:monitor/nodes/liveness] 请求ID [10242] 在 [5000毫秒] 后超时
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:698) ~[elasticsearch-2.4.4.jar!/:2.4.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]
Elasticsearch 日志:
[2020-05-30 06:29:46,784][INFO ][monitor.jvm ] [Tempo] [gc][old][230125][41498] 持续时间 [8.2秒],收集 [1]/[9秒],总计 [8.2秒]/[10.7小时],内存 [473.2MB] -> [426.1MB]/[494.9MB],所有池 {[young] [131.8MB] -> [84.7MB]/[136.5MB]}{[survivor] [0字节] -> [0字节]/[17MB]}{[old] [341.3MB] -> [341.3MB]/[341.3MB]}
[2020-05-30 06:33:47,782][INFO ][monitor.jvm ] [Tempo] [gc][old][230340][41540] 持续时间 [7秒],收集 [1]/[7.8秒],总计 [7秒]/[10.7小时],内存 [493.3MB] -> [425MB]/[494.9MB],所有池 {[young] [136.5MB] -> [83.6MB]/[136.5MB]}{[survivor] [15.4MB] -> [0字节]/[17MB]}{[old] [341.3MB] -> [341.3MB]/[341.3MB]}
[2020-05-30 06:37:59,384][INFO ][monitor.jvm ] [Tempo] [gc][old][230569][41582] 持续时间 [6.9秒],收集 [1]/[7.2秒],总计 [6.9秒]/[10.7小时],内存 [494.8MB] -> [424.7MB]/[494.9MB],所有池 {[young] [136.5MB] -> [83.4MB]/[136.5MB]}{[survivor] [16.9MB] -> [0字节]/[17MB]}{[old] [341.3MB] -> [341.3MB]/[341.3MB]}
在我的开发环境中我没有遇到这个问题,但是当我部署在 EC2 上时就会出现这个问题。此外,当我重新启动 Elasticsearch 时,它能够正常运行,没有问题,但在经过 10-15 分钟,或者根据搜索查询或插入查询的数量,错误消息就会出现。
另外,我的实例存储空间已使用 74%,总共 94G,120G中。这可能是内存问题吗?我相当确定我的 res-client 代码是没有问题的,因为它在生产环境中已经运行了很长时间。这可能是端口问题吗?我在 Docker 容器中使用了 ElasticSearch。
非常感谢任何帮助。
英文:
I am facing an issue which i wasn't earlier.
I am attaching logs of my service and elasticSearch (2.4.4):
2020-05-30 06:29:44.576 INFO 24787 --- [generic][T#287]] org.elasticsearch.client.transport : [Shatter] failed to get node info for {#transport#-1}{172.17.0.1}{172.17.0.1:9300}, disc
onnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][172.17.0.1:9300][cluster:monitor/nodes/liveness] request_id [10242] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:698) ~[elasticsearch-2.4.4.jar!/:2.4.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]
ElasticSearch Logs:
[2020-05-30 06:29:46,784][INFO ][monitor.jvm ] [Tempo] [gc][old][230125][41498] duration [8.2s], collections [1]/[9s], total [8.2s]/[10.7h], memory [473.2mb]->[426.1mb]/[494.9mb], all_pools {[young] [131.8mb]->[84.7mb]/[136.5mb]}{[survivor] [0b]->[0b]/[17mb]}{[old] [341.3mb]->[341.3mb]/[341.3mb]}
[2020-05-30 06:33:47,782][INFO ][monitor.jvm ] [Tempo] [gc][old][230340][41540] duration [7s], collections [1]/[7.8s], total [7s]/[10.7h], memory [493.3mb]->[425mb]/[494.9mb], all_pools {[young] [136.5mb]->[83.6mb]/[136.5mb]}{[survivor] [15.4mb]->[0b]/[17mb]}{[old] [341.3mb]->[341.3mb]/[341.3mb]}
[2020-05-30 06:37:59,384][INFO ][monitor.jvm ] [Tempo] [gc][old][230569][41582] duration [6.9s], collections [1]/[7.2s], total [6.9s]/[10.7h], memory [494.8mb]->[424.7mb]/[494.9mb], all_pools {[young] [136.5mb]->[83.4mb]/[136.5mb]}{[survivor] [16.9mb]->[0b]/[17mb]}{[old] [341.3mb]->[341.3mb]/[341.3mb]}
i am not facing the issue in my Development environment however when i deploy on EC2 i am getting this. Adding further when i do a restart of elastic. It works absolutely fine with no issues but after 10-15 mins or less depending on the amount for search queries or insertion queries, the error message appears.
Also, my storage space on the instance is 74% consumed 94G out of 120G.
can it be because of memory ?
I am pretty much sure my res-client code is fine as its working in production now for a long time.
Can it be a Port issue ? I am using docker container for elastic.
Any help will be appreciated.
答案1
得分: 1
我认为你的Elasticsearch堆大小非常低。我最好的猜测是通过增加堆大小,问题将得到解决。
关于为什么现在会出现这种情况,我认为是因为随着时间的推移,数据量增加了。
我第二个猜测是负载过高。似乎最近对Elasticsearch的请求过多。你可以通过 /_cat/thread_pool?v
来检查请求队列的大小。
针对这种情况,你有两个解决方案。第一,减少请求量。第二,添加一个节点并添加副本。
英文:
I think your heap size for elasticsearch is very low. my best guess is with increasing the heap size the problem will be solved.
To ask why this has happened now, I think it's because the volume of data has increased over time.
my second guess is about high load. It seems that you have too many request to elasticsearch recently. you can check the size of queue request via /_cat/thread_pool?v
.
you have two solution for this situation. first decrease the request. second add a node and add replica.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论