Four node ElasticSearch cluster across AWS Availability Zones

Hi, I have a four node elasticsearch cluster with two nodes in each availability zone in AWS. The "discovery.zen.minimum_master_nodes" is set to 3 which means that three nodes have to be available for the cluster to be established.

My question is around what will happen if the network is partitioned between the AZ's, I am assuming the whole cluster will go down as only two nodes will be available on each side. What will be the process to bring the cluster up again in this scenario and is this the best configuration?

Thanks.

MVZ

It is super easy to simulate that locally.
Just start 4 nodes then kill one of them then restart it.

Basically when the node will join again your cluster will react again.

Hi,

Thanks for the response, however my question is specifically in relation to the network being partitioned and two nodes being up in each datacentre. I would like to understand how I will get the elasticsearch cluster back online in this scenario as the requirement is for three nodes to be available in order to achieve quorum in the cluster.

I guess another question would be what is the best architecture practice when building an elasticsearch cluster across availability zones in AWS. Are two nodes in each AZ sufficient or should I have an uneven number of nodes?

Thanks.

MVZ.

In this scenario you are correct that the cluster would not be able to elect a master if 2 nodes were separated from the rest. It is generally better to have an odd number of master nodes to avoid an even split, and you could get this by making one of the nodes not master eligible. Once this is done, a network partition separating out one or two nodes would still allow the part of the cluster with at least 2 master eligible nodes to continue operating properly.

Hi,

Thanks for the response Christian, very helpful. So would I be correct in assuming that I could manually change the value of "discovery.zen.minimum_master_nodes" back to 2 in the config on the nodes in one Availability Zone if I had a network partition scenario and this would bring a couple of the nodes (cluster) back online?

Thanks again,

MVZ