Consistency between multiple _search requests

mmakoto · March 16, 2018, 7:58am

Hi, I have some questions about the consistency of Elasticsearch _search API results.

My environment:

Using Elasticsearch 6.1.1
The cluster consists of 3 nodes
number_of_replica is 1
Our app's logs are indexed to the Elasticsearch cluster.
End users get the logs using _search API

Question 1:

My understanding on Elasticsearch behavior (when indexing a document) is:

A document is indexed to primary shard
The primary shard transfers the request to all replica shards, and the document is indexed to all the replica shards.
After the primary shard gets execution result from replica shards, the primary shard returns a response to the client.
Primary shard executes "refresh" every 1 second (by default)
Each replica shard executes "refresh" every 1 second (by default)
The document can be hit on a shard by "_search" API only after the "refresh" execution.

There should be a time gap between 4 and 5. Therefore, when 4 is done and 5 is not, the following situation could happen:

User A executes "_search" and get the document in the result (if the request goes to the primary shard)
User B executes "_search" (with the same condition) and doesn't get the document in the result (if the request goes to the replica)

I'm considering if and how I can avoid this situation.
(I know user A and user B can eventually see the document after (at most?) 1 sec, though)

(Q1-1) Is my understanding above correct?
It would be great if _search request goes to primary only when refresh is done on primary only. (But maybe not?)

(Q1-2) If my understanding above is correct, is there any way to avoid that situation?

My thought is below (but couldnt' find a solution):

I found "refresh=wait_for" option for index API. However, even if I specify "refresh=wait_for", primary and replica shards execute "refresh" on different timing.
So, this situation doesn't change.
I also found "wait_for_active_shards" option for index API. However, this option just checks the num of active shards before indexing.
I also considered "preference=_primary" option for _search API. However, I found this option is obsolete and will be removed in Elasticsearch 7.0.

Question 2:

If a node is down and gets back again, shard(s) on the node shouldn't be the latest anymore.
My understanding is below:

A node "A" is down
Elasticsearch receives index API
The document is indexed to a primary shard (and succeeds, for example)
The primary shard transferes the request to all replica shards, and the request fails on one replica shard.
The primary shard asks master node to remove the replica shard from in-sync replica list.
The node "A" is up again.
Elasticsearch receives _search API just after the node "A" is up.
--> This request doesn't go to the node "A", because it is not in the in-sync replica list.

Is my understanding correct?
Can I say there is not possibility that _search API hits stale shards (in this "down" situation)?

system · April 13, 2018, 7:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How Refresh works between primary and replica shards Elasticsearch	6	1354	March 16, 2020
Read/write consistency Elasticsearch	3	74	February 7, 2025
Does elasticsearch 6.4 perform a search on a replica that is out-of-sync/stale? Elasticsearch	5	881	November 22, 2018
Index refresh and replication behaviour Elasticsearch	3	50	July 25, 2024
Number of results per shard Elasticsearch	5	375	May 13, 2020

Consistency between multiple _search requests

Related topics