During chaos testing, we noticed that the elasticsearch-hadoop library failed without retrying when the Master node of the Elasticsearch cluster was killed. The logs showed that the call to discover ES version failed:
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on [] failed; server[https://elasticsearch:9200] returned [503|Service Unavailable:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:505)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:463)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:637)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
It seems that the retry logic is only implemented for Bulk calls. To improve the resiliency of the elasticsearch-hadoop library could you please add retry logic to all REST calls from the elasticsearch-hadoop library to Elasticsearch cluster.
We are using version elasticsearch-hadoop library version 5.4.