For operational continuity in an multinode cluster i have two questions in my head:
the cluster contains, 14 nodes, 7 nodes on site A, 7 nodes on site B, ES Version 6.4
What if a) one site goes down unexpected and can be restored after lets say power fail? In this case the cluster routing is set to "all" all the time. Does elasticsearch recover by its self?
and b) one site goes down in a planned work and the routing is set to "none" in advance?
Of course, all the shards have at least one replica and are well distributed over the cluster.
some days ago, we had situation b) because of a relocation of site B to site C. After relocation and booting the nodes on site C, all the nodes which should reallocating unassigned shards, was throttled according to the recovery settings (simultaneous recovery was set to 8). the only thing we could do, to go in a green state, was reduce the affected shard replicas to 0.
What is the correct way, to relocate 50% of all the nodes?
Buy new nodes, place these on site C, add nodes on site C to the cluster, and shutdown nodes on site B?
shutdown all nodes on site B without disabling routing?
Thank you very much for your inputs!