Handle Failures in Index and Search API of High Level Client

I am using the Java High Level Client for indexing and searching documents in the AWS ES Cluster.

I want to get answers for some basic conceptual questions:

  1. For the Index API, what is the criteria for a successful IndexResponse ? Is there any exception thrown if indexing in any of the shard (replica/primary) fails ? I know that there is a write consistency level parameter, but I couldn't find it for the high level client Index API request.

  2. For the Search API, what is the criteria for a successful SearchResponse ? Is there any exception thrown when searching in any of the shards failed ?

My main motive is to know how do we generally handle failures from the search and index API's. What are the best practices for this, since I couldn't find much info for the high level client APIs.

on the write consistency parameter, it's been removed starting from Elasticsearch 5.0 in favour of the new wait_for_active_shards parameter. See https://github.com/elastic/elasticsearch/pull/19454 to know more. Anyways, consistency did not have anything to do with the operation being successful or not, it was just a check that enough shard copies were around, performed before the write operation was performed.

On knowing whether the index operation was successful or not, if there were failures etc. everything that's exposed via REST (see docs here) is also exposed to the high-level REST client: either an exception is thrown, meaning that the index operation failed completely, or you can find in the response the _shards section that says how many total shards the operation was supposed to be executed on, how many of them failed and how many of them performed the operation successfully. You can check in the high-level client docs how to retrieve such section using the high-level client.

On the search API, we currently only throw exceptions only in case of total failure, meaning all shards failed. This is not specific to the high-level client though, that's how Elasticsearch works at REST. When you do get a SearchResponse back , again you need to check whether you got partial results or results coming from all the shards, in the same way as when indexing, looking at the _shards header. Check out the docs for the search response here to find out how to retrieve such info using the high-level REST client.


What is motive behind throwing exceptions only in case of total failure in case of search API ? Because, if the search is successful only on a single shard where there are not matching records, but it failed on a shard where there was a matching record present, shouldn't this be an error ?

Also, I wanted to know what happens in a case where the index operation is successful for a single shard, and failed for all other shards ? How is the consistency across the shards maintained in this case ?

I want to know if all the shards will be eventually consistent even when index operation failed on some of them ? When should I report failure for my indexing operation ?

I want to point out that these are not questions around the Java Client or any client, but rather about how Elasticsearch works.

On throwing exceptions only for total failures, that is just how Elasticsearch works, the idea is to return some results if we have any, and to include in the response some info that tells users that they were partial results. Failures can happen in a distributed system and this is to some degree a way to be tolerant to failures. Of course it always depends on what the most important aspect is, depending on the application. We've been discussing about making this more explicit through a new flag, and changing the default behaviour, see https://github.com/elastic/elasticsearch/issues/28494.

When it comes to an indexing operation, things are quite different. First of all when you index a document, that document is supposed to go to one single shard, plus all of its replicas. If the indexing operation succeeds on the primary, and some replica fails, we will mark the replica failed and try to fix it, most likely by recreating it on another node. You will be notified of this through the _shards section that the index API returns. I am simplifying things here, bare with me. On the other hand if the operation fails on the primary, you will get back an error. The index operation is to be considered failed when an exception is returned.

Also, do keep in mind that searching for something through the search API may not return a document that was just indexed, as a refresh may not have happened between the index operation and the search, which is required to make newly indexed documents visibile to the search API. That does not mean that such documents are not there though, they are and this is not a matter of consistency, it is just how the search API works.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.