What is the allocation process when a primary shard goes down?

(Yehosef) #1

I've seen lots of people that talk about wanting primary shard evenly distributed - or not to be on certain nodes and the general answer given is "why - it doesn't matter - the replica does the same work as the primary".

But I'm curious as to the effect of a primary shard going down vs a replica. What happens when a primary shard goes down - what happens to write that happen while the cluster is reallocating and how intensive is the reallocation process?

(David Pilato) #2

When a primary goes down, one of the replicas is automatically promoted as primary by the master node.
Then a new replica is allocated in the cluster by the master node and data are copied over the wire.

Write operations are still possible during this time because you still have a replica in the cluster (index is in yellow state).

(Yehosef) #3

thanks for the answer!

But I'm still not 100% clear. I have a working cluster with a primary and replica shard. When the primary goes down - how does the master know? I assume there is some time until that node is marked as "down" until the replica is promoted, no? Otherwise a small network pause would cause failovers. How long is that and what happens in that time window?

(David Pilato) #4

First, a shard is most of the time being unavailable when a node stops.
The master often pings all nodes to check if they are still alive. Every second: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#fault-detection

When you send a index request, you send it to a coordinating node. This node tries to reach the primary shard first. If your node holding the primary is down, the coordinating node will try again during one minute by default. See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#timeout

(Yehosef) #5

Thanks for the additional information - very helpful.

What happens in the minute until the timeout - is the request is queued at the coordinating node? What is the queue size?

(system) #6