Elasticsearch : "failed to get node info for {IP}" and "noNodeAvailableException" in service log

huangapple go评论152阅读模式
英文:

Elasticsearch : "failed to get node info for {IP}" and "noNodeAvailableException" in service log

问题

以下是翻译好的内容:

我面临一个之前没有遇到过的问题。

我附上了我的服务和 Elasticsearch(2.4.4)的日志:

2020-05-30 06:29:44.576  INFO 24787 --- [generic][T#287]] org.elasticsearch.client.transport       : [Shatter] 无法获取节点 {#transport#-1}{172.17.0.1}{172.17.0.1:9300} 的信息,正在断开连接...

org.elasticsearch.transport.ReceiveTimeoutTransportException: [][172.17.0.1:9300][cluster:monitor/nodes/liveness] 请求ID [10242] 在 [5000毫秒] 后超时
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:698) ~[elasticsearch-2.4.4.jar!/:2.4.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_242]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

Elasticsearch 日志:

[2020-05-30 06:29:46,784][INFO ][monitor.jvm              ] [Tempo] [gc][old][230125][41498] 持续时间 [8.2秒],收集 [1]/[9秒],总计 [8.2秒]/[10.7小时],内存 [473.2MB] -> [426.1MB]/[494.9MB],所有池 {[young] [131.8MB] -> [84.7MB]/[136.5MB]}{[survivor] [0字节] -> [0字节]/[17MB]}{[old] [341.3MB] -> [341.3MB]/[341.3MB]}
[2020-05-30 06:33:47,782][INFO ][monitor.jvm              ] [Tempo] [gc][old][230340][41540] 持续时间 [7秒],收集 [1]/[7.8秒],总计 [7秒]/[10.7小时],内存 [493.3MB] -> [425MB]/[494.9MB],所有池 {[young] [136.5MB] -> [83.6MB]/[136.5MB]}{[survivor] [15.4MB] -> [0字节]/[17MB]}{[old] [341.3MB] -> [341.3MB]/[341.3MB]}
[2020-05-30 06:37:59,384][INFO ][monitor.jvm              ] [Tempo] [gc][old][230569][41582] 持续时间 [6.9秒],收集 [1]/[7.2秒],总计 [6.9秒]/[10.7小时],内存 [494.8MB] -> [424.7MB]/[494.9MB],所有池 {[young] [136.5MB] -> [83.4MB]/[136.5MB]}{[survivor] [16.9MB] -> [0字节]/[17MB]}{[old] [341.3MB] -> [341.3MB]/[341.3MB]}

在我的开发环境中我没有遇到这个问题,但是当我部署在 EC2 上时就会出现这个问题。此外,当我重新启动 Elasticsearch 时,它能够正常运行,没有问题,但在经过 10-15 分钟,或者根据搜索查询或插入查询的数量,错误消息就会出现。

另外,我的实例存储空间已使用 74%,总共 94G,120G中。这可能是内存问题吗?我相当确定我的 res-client 代码是没有问题的,因为它在生产环境中已经运行了很长时间。这可能是端口问题吗?我在 Docker 容器中使用了 ElasticSearch。

非常感谢任何帮助。

_cat/fielddata?v
Elasticsearch : "failed to get node info for {IP}" and "noNodeAvailableException" in service log

_cat/nodes?v
Elasticsearch : "failed to get node info for {IP}" and "noNodeAvailableException" in service log

英文:

I am facing an issue which i wasn't earlier.

I am attaching logs of my service and elasticSearch (2.4.4):

2020-05-30 06:29:44.576  INFO 24787 --- [generic][T#287]] org.elasticsearch.client.transport       : [Shatter] failed to get node info for {#transport#-1}{172.17.0.1}{172.17.0.1:9300}, disc
onnecting...

org.elasticsearch.transport.ReceiveTimeoutTransportException: [][172.17.0.1:9300][cluster:monitor/nodes/liveness] request_id [10242] timed out after [5000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:698) ~[elasticsearch-2.4.4.jar!/:2.4.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_242]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

ElasticSearch Logs:

[2020-05-30 06:29:46,784][INFO ][monitor.jvm              ] [Tempo] [gc][old][230125][41498] duration [8.2s], collections [1]/[9s], total [8.2s]/[10.7h], memory [473.2mb]->[426.1mb]/[494.9mb], all_pools {[young] [131.8mb]->[84.7mb]/[136.5mb]}{[survivor] [0b]->[0b]/[17mb]}{[old] [341.3mb]->[341.3mb]/[341.3mb]}
[2020-05-30 06:33:47,782][INFO ][monitor.jvm              ] [Tempo] [gc][old][230340][41540] duration [7s], collections [1]/[7.8s], total [7s]/[10.7h], memory [493.3mb]->[425mb]/[494.9mb], all_pools {[young] [136.5mb]->[83.6mb]/[136.5mb]}{[survivor] [15.4mb]->[0b]/[17mb]}{[old] [341.3mb]->[341.3mb]/[341.3mb]}
[2020-05-30 06:37:59,384][INFO ][monitor.jvm              ] [Tempo] [gc][old][230569][41582] duration [6.9s], collections [1]/[7.2s], total [6.9s]/[10.7h], memory [494.8mb]->[424.7mb]/[494.9mb], all_pools {[young] [136.5mb]->[83.4mb]/[136.5mb]}{[survivor] [16.9mb]->[0b]/[17mb]}{[old] [341.3mb]->[341.3mb]/[341.3mb]}

i am not facing the issue in my Development environment however when i deploy on EC2 i am getting this. Adding further when i do a restart of elastic. It works absolutely fine with no issues but after 10-15 mins or less depending on the amount for search queries or insertion queries, the error message appears.

Also, my storage space on the instance is 74% consumed 94G out of 120G.
can it be because of memory ?
I am pretty much sure my res-client code is fine as its working in production now for a long time.
Can it be a Port issue ? I am using docker container for elastic.

Any help will be appreciated.

_cat/fielddata?v
Elasticsearch : "failed to get node info for {IP}" and "noNodeAvailableException" in service log

_cat/nodes?v
Elasticsearch : "failed to get node info for {IP}" and "noNodeAvailableException" in service log

答案1

得分: 1

我认为你的Elasticsearch堆大小非常低。我最好的猜测是通过增加堆大小,问题将得到解决。
关于为什么现在会出现这种情况,我认为是因为随着时间的推移,数据量增加了。

我第二个猜测是负载过高。似乎最近对Elasticsearch的请求过多。你可以通过 /_cat/thread_pool?v 来检查请求队列的大小。
针对这种情况,你有两个解决方案。第一,减少请求量。第二,添加一个节点并添加副本。

英文:

I think your heap size for elasticsearch is very low. my best guess is with increasing the heap size the problem will be solved.
To ask why this has happened now, I think it's because the volume of data has increased over time.

my second guess is about high load. It seems that you have too many request to elasticsearch recently. you can check the size of queue request via /_cat/thread_pool?v.
you have two solution for this situation. first decrease the request. second add a node and add replica.

huangapple
  • 本文由 发表于 2020年5月30日 14:53:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/62098906.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定