Primary and Replica Shard Sync

mosiddi · February 2, 2016, 11:57am

Let us take a sample scenario in Azure Elasticsearch setup where we have below shard, node, index allocation mapping something like below –

During planned Azure update, here is the list of things which will happen –

Couple of Qs -

Will M2's shards' updated contents (when M1 was down) will be synced to M1's shard?
Will M1's shards' updated contents (when M2 was down) will be synced to M2's shard?

mosiddi · February 3, 2016, 3:49am

Ping.

Christian_Dahlqvist · February 3, 2016, 7:20am

If you only have two nodes in your cluster and both are master eligible, you should have minimum_master_nodes set to 2. This means that when the first node goes down, Elasticsearch would stop accepting indexing requests as no master can be elected. The primary and the replica should therefore have the same data.

If you on the other hand had 3 (or more) nodes in the cluster, it would be possible to elect a master after the first node went offline and Elasticsearch will then relocate the missing shard to a node that does not already contain it. It can then continue taking writes.

mosiddi · February 3, 2016, 7:57am

Thanks @Christian_Dahlqvist

I intentionally didn't talk about master nodes... Assume these are just 2 data nodes in the ES cluster and master/client nodes are different ones.

Christian_Dahlqvist · February 3, 2016, 8:11am

Assuming you only have 2 data nodes and separate dedicated master nodes so that a master is available at all times, I believe Elsticsearch still will not accept the write while one doc the data nodes is down as it requires a quorum of shards to be available.

mosiddi · February 3, 2016, 8:24am

In this case, the replica count is 1... So quorum is 1 and not 2.. If primary is available, indexing will succeed.

Note, for the case where the number of replicas is 1 (total of 2 copies of the data), then the default behavior is to succeed if 1 copy (the primary) can perform the write.

Christian_Dahlqvist · February 3, 2016, 11:11am

If that is the case you would probably run the risk of losing some data if you do not let the cluster settle into a green state before taking the next data node down.

mosiddi · February 3, 2016, 11:31am

agreed. I would like to know how ES resolves the data inconsistency between shards in this case

Christian_Dahlqvist · February 3, 2016, 11:46am

Elasticsearch will take one of the shards, so any data written only to the other will be lost.

mosiddi · February 3, 2016, 12:26pm

Thanks @Christian_Dahlqvist !

warkolm · February 3, 2016, 10:53pm

FYI we are all volunteers here, even those that work for Elastic. If you want SLA based response times then you should look at a Subscription with Elastic.

Otherwise, please be patient and respect your fellow community members.

mosiddi · February 4, 2016, 3:36am

Point noted. Thanks.

Topic		Replies	Views
3 node ES cluster...one node only holds replicas Elasticsearch	10	2097	July 5, 2017
Primary Shards and Replica Shards Elasticsearch	2	384	April 9, 2019
Shard rebalancing after adding nodes Elasticsearch	3	634	March 10, 2019
Does Elasticsearch ensures that all the replica shards are distributed in other data nodes? Elasticsearch	3	341	November 30, 2018
Error on creating replicas Elasticsearch	4	375	July 5, 2017

Primary and Replica Shard Sync

Related topics