We recently ran into an issue where all copies (2 replica and 1 primary) of a shard of an index were in un-allocated state. This happened when the data nodes had master connectivity issue (each data node lost connectivity with master once at different times).
Our hunch is something like below could have happened -
We have 3 data nodes – D11, D12 and D13. One of them had the primary and rest of the 2 had the replica shards.
From the logs, the sequence of events that could have led to this this situation –
- Time T1:
** Node D11 had primary and nodes D12 and D13 had replica copies - Time T2:
** Node D12 had n/w issue due to which it was not able to ping master (M12) for almost a minute. Once it could talk to the master again, it started the initialization process for all shards again. - Time T3:
** The replica initialization was almost stuck in node D12. This could be because it was initializing from node D11 which itself left the cluster.
** Node D11 had n/w issue due to which it was not able to ping master (M12) for almost a minute. Once it could talk to the master again, it started the initialization process for all shards again.
** In the meantime, the cluster made D13’s copy as primary - When node D11 came back, the nodes’ copy was marked failed as it had a failed primary and the copy was marked as replica
** At this state, we had 2 failed replicas - Time T4:
** Node D13 had n/w issue due to which it was not able to ping master (M12) for almost a minute.
** Cluster service would have tried making the other copies as primary but they both were in failed state and hence the none of the shard copies were available
** When node D13 joined back, the shard was already marked in failed state and hence the allocation wasn’t explicitly tried by the cluster
anyone has any inputs on the same? Is our understanding correct?
Thanks
Imran