I use the org.elasticsearch.client.RestClient to bulk index documents, such as:
response = client.performRequest("POST", "some_endpoint/some_type/_bulk", Collections.emptyMap(), entity);
Then I check if the request was successful:
if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
throw new RuntimeException("something went wrong when indexing");
}
My observation is that even if the response status code is indeed HttpStatus.SC_OK, I am not guaranteed to have the documents in Elasticsearch at some point in time.
I use multiple clients in parallel, wrapped in Java Callables, this works usually great for millions of documents.
But sometimes a client will throw a java.net.SocketTimeoutException
(probably because Elasticsearch cannot handle too many requests at a time). When such an exception is thrown, some documents uploaded in another Callable will not end up in Elasticsearch, even though their upload was finished and the status was HttpStatus.SC_OK.
It is very concerning to me that it seems I cannot reply on this status code as guarantee that at some point the documents will be indexed.
Has anybody had similar observations or some ideas what might help in my use case?