SocketTimeoutException within Java High Level Client

I'm building an application involving SpringBoot and the official Elastic REST High Level Client, and it seems that the connection with the Elastic server eventually dies if not used after some time.

On the application I'm using a single RestHighLevelClient from within a Spring controller, used by all the endpoints that require querying Elastic. When the application has not been used for some time (some hours), almost all queries to Elastic result on the exception: 30,000 milliseconds timeout on connection http-outgoing-0 (the 30 sec. timeout was set on purpose). The application must be restarted so it can query Elastic again.

From what I tried out, I think I can temporarily fix this by performing pings to the Elastic server periodically in a background thread. It seems that the connection never dies this way, but does not seem right that the RestHighLevelClient is not able to reconnect the Elastic server without this workaround.

It is also important to clarify that the Elastic server is actually a 3-node cluster, with 1 main node (which I use as unique entrypoint for all queries - I will try setting all the nodes of the cluster when creating the RestHighLevelClient object though).

Maybe is some better solution to this issue that does not involve some workaround-trickery? As I read, the RestHighLevelClient uses an HTTP connection pool. Maybe it could be configured to use unique HTTP requests, like it would be normally done when not using this library?

(I found this topic with the same problem and a similar solution - but no replies: RestHighLevelClient SocketTimeoutException)

1 Like

There's no idle timeout within Elasticsearch (neither the server nor the client) but it's entirely possible that something else on the path between them is destroying connections it believes to be dead because they've been idle for too long.

By default today the Java client doesn't enable TCP keepalives, but enabling them (and configuring them appropriately for your network) is usually a good start:

1 Like

I don't think this is the case either, but you may need to send a request on every pooled connection and wait for each of them to time out before the connections are re-established. It's possible you even need to wait for the full TCP retransmission timeout (defaults to 15 minutes but a shorter timeout is recommended).

1 Like

Thank you for your reply. I tried what you propose on the GH issue and it seems it fixes my problem, even without changing the keepalive settings on Linux (I might try to tweak them though, just in case).

This is what I'd expect if you're leaving the connection unused for over 2 hours (plus 9 * 75 seconds for retries) since the keepalives will detect that the connection has been dropped which will make the client will open a new connection for the next request. However it's very likely that the connection is actually torn down much sooner (30 and 60 minutes are popular timeouts). I expect you'll find that connections which have been idle for just under 2 hours still give you problems.