We are working on redesigning our currently cluster. Our 40 ES data nodes are split between 2 data centers, 20 in each. Currently there are 3 master dedicated nodes, 2 in one DC, and 1 in the other. The goal with the two datacenters would be to have the primary shard in one DC and the replica in the other.
Theoretically, if 1 datacenter was to go offline, a new master election would fail because there wouldn't be enough nodes for a quorum. If we increased the number of masters this problem continues as 3,5,7 etc. all will have the same issue.
Is there a solution to this problem?
Thanks!