Does the replica shards automatically get deleted after reallocation using the cluster reroute api when nodes go down

Hello,
We are manually reallocating shards to active nodes using the Cluster Reroute API after one or more nodes go down and indices turn red. We basically find the unassigned shards for indices that are red and move the primary shards to a currently active node and ES automatically creates the replica shards. We want to ensure that we dont have any stale data lying around. The elasticsearch documentation on cluster reroute API states the following:

The allow_primary parameter will force a new empty primary shard to be allocated without any data. If a node which has a copy of the original primary shard (including data) rejoins the cluster later on, that data will be deleted: the old shard copy will be replaced by the new live shard copy.

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html

Our question is:
If a node which has a copy of the original replica shard(including data) rejoins the cluster later on, Does that data gets deleted as well?
In other words,
Does ES clean up data for both primary and replica shards if they have been reallocated and later, the original node containing those shards comes back into the cluster?

Why are you having to do this manually? ES should just manage this transparently.

Depends what version you are on.

ES seems to rebalance when one or more shards are available. For instance, if index is configured to have 2 copies of shards(one primary and one replica). If both primary and replica shards are lost due to two different nodes going down before the ES has a chance to rebalance. Then, the index will go red and never recover. We found that during our testing. We are manually reallocating the primary shard onto an active node and then ES automatically creates a replica shard. These have no data.

We are on 1.7.3

Then that's expected, ES cannot recover data from a shard that no longer exists in the cluster anywhere.

What is the behavior if:

  1. both primary and replica shards are lost due to two nodes(1 and 2) going down.
  2. Both are re-assigned to active nodes using the reallocate API.
  3. Nodes(1 and 2) that were down re-join the cluster.

The ES documentation says that the PRIMARY shard on node rejoining the cluster will be deleted. But, does not mention anything about the replica shard.
Does the replica shard gets deleted as well?

Yep it will becuase there will be an existing replica.

Where is that in the docs, we can get it updated.

It is in the Warning section on the following page:
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html

Thanks a ton Mark :slight_smile:

FYI https://github.com/elastic/elasticsearch/issues/16113