Swapping primary and replica shard allocations in two node clusters

vigyas · May 14, 2019, 2:12pm

Suppose there is a two node cluster and indices have 1 replica. Initial shard allocation distributes primaries and replicas across both nodes.

node-1: [p0, p2, p4, p6, p8, r1, r3, r5, r7, r9]
node-2: [p1, p3, p5, p7, p9, r0, r2, r4, r6, r8]

If one node goes down, all primaries get allocated on the remaining node (i.e. their replicas get promoted) --

node-1: [p0, p2, p4, p6, p8, p1, p3, p5, p7, p9]

When node-2 finally comes back up, it ends up with all replica shards, while all primaries are on node-1

node-1: [p0, p2, p4, p6, p8, p1, p3, p5, p7, p9]
node-2: [r0, r2, r4, r6, r8, r1, r3, r5, r7, r9]

Is there any way to bring this cluster back to having both primaries and replicas across both nodes without adding a third node or temporarily reducing replica count?

We don't want all primaries on one node as primaries exert more stress on nodes in update scenarios that we commonly need. (https://github.com/elastic/elasticsearch/issues/41543)

vigyas · May 14, 2019, 3:13pm

To add some context -- is there a way to force unallocate replica, move primary to a node-2 and then allocate the replica on node-1?

Is this the similar (in terms of cluster overhead) to reducing replica count, letting primaries rebalance, and increasing replica count back on an index?

DavidTurner · May 14, 2019, 4:06pm

If you manually cancel the allocation of the primary then the replica will be promoted and then the old primary will be reassigned as a replica and brought back into sync. This isn't a great solution because the shard lacks redundancy while the primary is restarting as a replica.

There is no mechanism for "demoting" a primary shard back to a replica without needing to restart it.

vigyas · May 14, 2019, 5:28pm

Thanks @DavidTurner. Understand the redundancy risk. Does this have lesser overhead than reducing replica count and then increasing it? (I guess reducing replicas and bringing them back will move all data for primaries, then bring back replicas which would again be peer recovery?)

If you manually cancel the allocation of the primary

Assuming you are referring to CancelAllocationCommand, right?

DavidTurner · May 14, 2019, 9:32pm

Yes I think so. If you reduce the replica count then it's likely that the data for the unneeded replicas will actually be deleted from disk, so Elasticsearch will need to completely rebuild these shard copies to rebalance the cluster (i.e. copy the whole shard over the network) and then completely rebuild every shard copy again when you increase the replica count at the end of the process.

OTOH if you cancel the primary's allocation then nothing will be deleted, and then that copy can often be rebuilt as a replica by copying only any missing operations.

Yes.

system · June 11, 2019, 9:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Allocating shards and replicas Elasticsearch	3	272	July 6, 2017
One node one primary shard Elasticsearch	1	246	September 17, 2021
How to rebalance primary shards on elastic cluster Elasticsearch	5	12470	May 23, 2019
Assign unassigned primary shard Elasticsearch	7	2426	July 6, 2017
How to distribute Primary & Replica shards equally across the nodes Elasticsearch	3	395	April 27, 2023

Swapping primary and replica shard allocations in two node clusters

Related topics