During chaos testing, we noticed that the elasticsearch-hadoop library failed without retrying when the Master node of the Elasticsearch cluster was killed. The logs showed that the call to discover ES version failed:
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on  failed; server[https://elasticsearch:9200] returned [503|Service Unavailable:] at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:505) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:463) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429) at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155) at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:637) at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
It seems that the retry logic is only implemented for Bulk calls. To improve the resiliency of the elasticsearch-hadoop library could you please add retry logic to all REST calls from the elasticsearch-hadoop library to Elasticsearch cluster.
We are using version elasticsearch-hadoop library version 5.4.