Java High Level Rest Client is not releasing connection although timeout is set

Hi,

We are dealing in querying large amount of data from our elastic search server. However, in random scenarios the connection created by the High Level Rest client is not getting released and it's blocking our code flow.

Rest client opens connection to get the data but doesn't return. Although we have specified the timeouts as well as shown below, but the rest client doesn't timeout as well.

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(httpHost).setRequestConfigCallback(
        requestConfigBuilder -> requestConfigBuilder
            .setConnectTimeout(30000)
            .setConnectionRequestTimeout(90000)
            .setSocketTimeout(90000)).setMaxRetryTimeoutMillis(90000));

We have checked on Elastic search server as well, the query gets finished but client still doesn't release the connection. Also, the load on elastic search and server is not high at all when we trigger these queries.

Queries are also not that complex. It consists of only filtering and single level of aggregation.

The thread, in which this query is triggered, has been observed to be blocked for over 10 hours. Expected time is just a few minutes.

However when we retry the query later, it goes through fine. So it's happening randomly.

Configurations:
Elasticsearch version: 6.5.4
Rest client version: 6.5.4
allow_partial_search_results : false (both on client side and server side cluster level)

Can you please suggest what are we missing ?

Related question: Rest High Level Client : Request timeout is not working

Hi @KULDEEP_SINGH,

can you explain more how this is blocking your code flow? When the thread in which the query is triggered is blocked for over 10 hours, where is it blocked (stack trace)?

Also, if you have more detailed connection info available for how you came to the conclusion that connections are not released, it would be nice to have that too.

Hi @HenningAndersen

Thanks for your response.

We don't get any error and hence we don't have any stack trace.

"Blocking the code flow" - I meant to say that the code triggered the rest client's search method, but it never returned anything. It's just stuck there. No timeout, no response, and no error either.

=> Command we used to find out connections in ESTABLISHED state (not TIME-WAIT, not LISTEN)

Command:  sudo netstat -atnp | grep 9200 | grep ESTABLISHED
Response: tcp 0 0 11.0.15.99:*56148 *11.0.15.220:9200 ESTABLISHED 15107/java

=> Convert that port number (in this case 56148) to hex

Command:  printf '%x\n' 56148
Response: db54

=> Check for that port in list of TCP connections in /proc/net

Command:  grep -i db54 /proc/net/tcp
202: 630F000A:DB54 DC0F000A:23F0 01 00000000:00000000 00:00000000 00000000 995 0 1666787418 1 ffff880658a95540 20 4 1 12 16

=> Find out the socket fd number (in this case ffff880658a95540). In this case 15107 is the PID found in the first step

Command:  sudo ls -lt --full-time /proc/15107/fd | grep 1666787418
Response: lrwx------. 1 wildfly wildfly 64 2019-12-04 22:44:45.920523944 +0000 1199 -> socket:[1666787418]

=> This gives us approx when the socket connection was opened.

We found that this socket was opened for more than 11 hours.
In the elastic search server, we looked into tasks list using Task API. Nothing was running there at that time. So it seems like the request was completed on elastic search server side but somehow rest client got stuck and didn't return any response.

Please let us know if this information works for you, or is there anything else we can provide specifically?

Hi @KULDEEP_SINGH,

you should be able to use jstack <pid> to get a stack trace for your client java program when the thread hangs to see if it is hung somewhere specific.

Also, notice that the max retry timeout was removed in 7.0, since it caused numerous issues, see:


and

I would suggest trying to raise it to something like 3x the socket timeout though I think the symptom does not clearly match your description.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.