Knowing when and when not to retry a request based on ElasticsearchException or IOException with the RestHighLevelClient

I would like to build in some retry logic on our Elasticsearch requests but I cannot find documentation to explain WHEN a request should be submitted again after a short delay. Also knowing if you are supposed to retry IOExceptions or only certain ones..

This is the only documentation that I found describing the errors (src: Bulk API | Java REST Client [6.8] | Elastic)

Synchronous calls may throw an IOException in case of either failing to parse the REST response in the high-level REST client, the request times out or similar cases where there is no response coming back from the server. In cases where the server returns a 4xx or 5xx error code, the high-level client tries to parse the response body error details instead and then throws a generic ElasticsearchException and adds the original ResponseException as a suppressed exception to it.

But there are no documents that explain what error codes are deemed okay to retry and which ones are errors that cannot be retried?

Any help would be greatly appreciated. Thanks!

How to handle the errors may be use case specific if you want/need to retry.

You can take a look at how Logstash implements retry here: https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/master/lib/logstash/outputs/elasticsearch/common.rb#L225

In summary Logstash retries anything that is not a 409 (conflict), 400(bad request) or 404 (missing) . Logstash also implements an incremental back off when retrying with continued failures which helps keep things healthy especially when Elasticsearch is pushing back with 429 (too many requests).

This is pretty good general retry policy. However, if for example, you know all requests should authorized, you may not want to retry 401 or 403s.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.