2020年9月9日 16:20:15go评论111阅读模式

英文:

Elasticsearch RestClient Connection reset by peer

问题

在我的AWS VPC中，我有一个具有2个节点的ES集群。在这些节点之上，我有一个负载均衡器。在同一个VPC中，我有一个微服务，通过RestHighLevelClient版本7.5.2访问Elasticsearch。

我按以下方式创建客户端：

public class ESClientWrapper {
    
    @Getter
    private RestHighLevelClient client;
    
    public ESClientWrapper() throws IOException {
        FileInputStream propertiesFile = new FileInputStream("/var/elastic.properties");
        Properties properties = new Properties();
        properties.load(propertiesFile);
        RestClientBuilder builder = RestClient.builder(new HttpHost(
                properties.getProperty("host"),
                Integer.parseInt(properties.getProperty("port"))
        ));
    
        this.client = new RestHighLevelClient(builder);
    }
}

当我的微服务一段时间没有接收到请求（12小时），出现以下情况：第一个发送的响应（或之后的几个）会出现以下错误：

2020-09-09 07:03:13.106  INFO 1 --- [nio-8080-exec-1] c.a.a.services.CustomersMetadataService  : 正在尝试添加以下角色：{role=a2}
2020-09-09 07:03:13.106  INFO 1 --- [nio-8080-exec-1] c.a.a.e.repositories.ESRepository        : 正在尝试将以下文档插入到app-index：{role=a2}
2020-09-09 07:03:13.109 ERROR 1 --- [nio-8080-exec-1] c.a.a.e.dal.ESRepository       : 无法添加客户：{role=a2}
    
java.io.IOException: 连接被对等方重置
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
        ...
        ...

正如您所见，在失败的请求后的3分钟内，下一个请求被ES成功处理。是什么原因导致请求失败？我已经检查了Elasticsearch的日志，没有看到任何有关终止连接的迹象。微服务与Elasticsearch位于同一个VPC中，因此不会经过可能终止连接的任何防火墙。

我在GitHub上找到了这个问题，建议增加默认的连接超时时间，但我想知道这里的问题是否真的是超时问题，以及增加默认时间是否真的是最佳解决方案。

另外，我在他们的仓库中发现了这个bug，关于相同的问题，但没有任何答案。

更新
我注意到即使在我的服务运行了10分钟后，仍会发生这种情况。我的服务启动并向ES发送了一个查询，一切正常。但在10分钟后，我发送了一个插入请求，但由于连接被对等方重置而失败。

英文:

I have in my AWS VPC a cluster of ES with 2 nodes. On top of those nodes I have a load balancer. In the same vpc I have a microservice that accesses Elasticsearch via RestHighLevelClient version 7.5.2 .

I create the client in the following manner :

public class ESClientWrapper {
    @Getter
    private RestHighLevelClient client;
    public ESClientWrapper() throws IOException {
        FileInputStream propertiesFile = new FileInputStream(&quot;/var/elastic.properties&quot;);
        Properties properties = new Properties();
        properties.load(propertiesFile );
        RestClientBuilder builder = RestClient.builder(new HttpHost(
                properties .getProperty(&quot;host&quot;),
                Integer.parseInt(properties.getProperty(&quot;port&quot;))
        ));
        this.client = new RestHighLevelClient(builder);
    }
}

When my micro service doesn't get requests for a long time (12h..) there are occurrences when the first response that is sent (or a few after..) are getting the following error:

    2020-09-09 07:03:13.106  INFO 1 --- [nio-8080-exec-1] c.a.a.services.CustomersMetadataService  : Trying to add the following role : {role=a2}
2020-09-09 07:03:13.106  INFO 1 --- [nio-8080-exec-1] c.a.a.e.repositories.ESRepository        : Trying to insert the following document to app-index : {role=a2}
2020-09-09 07:03:13.109 ERROR 1 --- [nio-8080-exec-1] c.a.a.e.dal.ESRepository       : Failed to add customer : {role=a2}
java.io.IOException: Connection reset by peer
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:871) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
   	....
	....
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.35.jar!/:9.0.35]
        at java.base/java.lang.Thread.run(Thread.java:836) ~[na:na]
Caused by: java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:na]
        at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:na]
        at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[na:na]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245) ~[na:na]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[na:na]
        at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358) ~[na:na]
        at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:231) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:136) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:241) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        ... 1 common frames omitted
2020-09-09 07:06:55.109  INFO 1 --- [nio-8080-exec-2] c.a.a.services.MyService  : Trying to add the following role : {role=a2}
2020-09-09 07:06:55.109  INFO 1 --- [nio-8080-exec-2] c.a.a.e.repositories.ESRepository        : Trying to insert the following document to index app-index: {role=a2}
2020-09-09 07:06:55.211  INFO 1 --- [nio-8080-exec-2] c.a.a.e.dal.ESRepository       : IndexResponse[index=app-index,type=_doc,id=x532323272533321870287,version=1,result=created,seqNo=70,primaryTerm=1,shards={&quot;total&quot;:2,&quot;successful&quot;:2,&quot;failed&quot;:0}]

As you can see, 3 minutes after the failed request the next request was successfully handeled by ES. What can kill the request ? I checked Elasticsearch logs and didn't see any indication for killing connection. The MS is in the same vpc as elastic so it isn't passing through any firewall that might kill it.

I found the following issue in github that suggested to increase the default connection timeout but I'm wondering if the issue here is really a timeout problem and if increasing the default time is really the best solution..

Also, I found this bug opened in their repo regarding the same problem but without any answers.

UPDATE
I noticed that even after 10 minutes my service is up this happens. My service started and sent a query to ES and everything worked well. After 10 minutes I sent insert request and it failed on connection reset by peer.

答案1

得分: 1

最终，我没有在我的配置/实现中找到任何问题。看起来这是Elasticsearch的RestHighLevelClient实现中的一个错误。

我实现了一个重试机制，对RestHighLevelClient进行了封装，如果我收到相同的错误，就会重试查询。我在这个解决方案中使用了Spring的@Retry注解。

英文:

In the end I didn't find a problem in my configuration/implementation. It seems like a bug in the implementation of Elasticsearch's RestHighLevelClient.

I implemented a retry mechanism that wraps the RestHighLevelClient and retries the query if I get the same error. I used Spring @Retry annotation for this solution.

答案2

得分: 1

我之前遇到了同样的问题。一切都运行得很好，但过了一段时间后，一个单独的请求被拒绝了。

解决方案（在我这种情况下）是通过设置TCP连接的keepalive属性来解决的：

final RestClientBuilder restClientBuilder = RestClient.builder(...);
restClientBuilder.setHttpClientConfigCallback(httpClientBuilder -> httpClientBuilder.setDefaultIOReactorConfig(IOReactorConfig.custom()
                                    .setSoKeepAlive(true)
                                    .build()))

在这里找到：
https://github.com/elastic/elasticsearch/issues/65213

英文:

I was facing the same issue. Everything worked fine, but after some time a single request got refused.

The solution (in my case) was to set the keepalive property of the tcp connection with:

final RestClientBuilder restClientBuilder = RestClient.builder(...);
    
restClientBuilder.setHttpClientConfigCallback(httpClientBuilder -&gt; httpClientBuilder.setDefaultIOReactorConfig(IOReactorConfig.custom()
                                .setSoKeepAlive(true)
                                .build()))

Found here:
https://github.com/elastic/elasticsearch/issues/65213

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Elasticsearch RestClient 连接被对等方重置。

问题

答案1

答案2

Drools规则 – 空值检查和累积条件

如何在多线程聊天客户端中为客户端分配用户名

运行时间复杂度分析

如何删除数组中的最后一个元素并将新元素放在最前面？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。