RestHighLevelClient get api hangs forever

I am using elasticsearch-rest-client 7.6.2 in my project and we saw that in one of the get call the get hangs forever and never returns. We got the thread dump and see that the thread is blocked on org.apache.http.concurrent.BasicFuture (we noticed that the thread is stuck here for more than hour)

Below is the thread stack of the issue. There seems to be a concurrency issue where the Future is never updated with success or failure, that is causing the thread to wait forever.

"pool-processor-thread-5" #173 prio=5 os_prio=0 tid=0x00007f38fe271000 nid=0x3d33 in Object.wait() [0x00007f3843591000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.http.concurrent.BasicFuture?.get(BasicFuture?.java:82)
locked <0x00000006f2adf070> (a org.apache.http.concurrent.BasicFuture?)
at org.apache.http.impl.nio.client.FutureWrapper?.get(FutureWrapper?.java:70)
at org.elasticsearch.client.RestClient?.performRequest(RestClient?.java:244)
at org.elasticsearch.client.RestClient?.performRequest(RestClient?.java:235)
at org.elasticsearch.client.RestHighLevelClient?.internalPerformRequest(RestHighLevelClient?.java:1514)
at org.elasticsearch.client.RestHighLevelClient?.performRequest(RestHighLevelClient?.java:1484)
at org.elasticsearch.client.RestHighLevelClient?.performRequestAndParseEntity(RestHighLevelClient?.java:1454)
at org.elasticsearch.client.RestHighLevelClient?.get(RestHighLevelClient?.java:742)

Below is the code to create the client.

RestClientBuilder builder = RestClient.builder(new HttpHost(esHost, port, protocol))
.setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {
@Override
public RequestConfig.Builder customizeRequestConfig(
RequestConfig.Builder requestConfigBuilder) {
return requestConfigBuilder.setConnectTimeout(10000)
.setSocketTimeout(60000);
}
});

You can get an infinite wait here if you do not enable TCP keepalives in your client. Even with keepalives enabled, the default interval is 2 hours, so you probably want to make that shorter too.

Thank you very much for the suggestion. I will test with reduced keep alive time and see if that fix. The other thing is it just happened once and not easily reproducible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.