ElasticSearch Cluster Across Availability Zones in AWS

Hi, I have a four node elasticsearch cluster with two nodes in each availability zone in AWS. The "discovery.zen.minimum_master_nodes" is set to 3 which means that three nodes have to be available for the cluster to be established.

My question is around what will happen if the network is partitioned between the AZ's, I am assuming the whole cluster will go down as only two nodes will be available on each side. What will be the process to bring the cluster up again in this scenario and is this the best configuration?

Thanks.

Mike.

Yep.

Start another node and join it to the cluster.

Well if you want cross AZ HA then there are things you need to balance and be prepared for, like this problem :wink:
You could have a "tie breaker" node in another AZ though.

Would it be an option to change the "discovery.zen.minimum_master_nodes:" parameter from 3 to 2 on the two nodes in one AZ and then restart the nodes to bring up the cluster in that AZ?

That'd work as well.

But then you need to restart the cluster again when you bring the other AZ back up.

Ok,

Maybe my understanding is wrong but wouldn't the other two nodes in AZ-B just join the cluster in AZ-A when the network partitioning event is over as the cluster will already be formed in AZ-A as a result of me manually changing the "discovery.zen.minimum_master_nodes:" config to "2"?

Thanks Michael.

They will rejoin but you are at risk of split brain as you reduced the min masters setting.
You could update it via the APIs, but it should ideally be a static setting.

Hey thanks again for the reply.

Please correct me if I am wrong but the scenario would be:

  • Network Partition Event Occurs
  • All four nodes (Two in AZ-A and two in AZ-B have the discovery.zen.minimum_master_nodes set to 3
  • Cluster does down
  • I reconfigure nodes in AZ-B to have a discovery.zen.minimum_master_nodes value of 2
  • Cluster comes up in AZ-B only (two nodes only)
  • When network partition even is over nodes in AZ-A join the cluster via master node in AZ-B and cluster continues to operate without any issues

Above is my understanding of how it will work, however I may be wrong so please correct me if I am. I just dont see how it will result in Split Brain in this scenario as the two nodes in AZ-A should simply rejoin the cluster when the network partition event is over as there will be an existing master elected at that point.

Thanks,

Mike.

What if you get a network partition again?

It should be exactly the same scenario repeated.

As you can see in the scenario above I am never configuring all four nodes with a discovery.zen.minimum_master_nodes value of 2. Only two nodes to force the cluster to come up in a single site.

Does this make sense? I am not sure how this could result in split brain, if you could explain your logic that would be really helpful.

Regards,

Mike.

[quote="mvz00, post:7, topic:75416"]
Network Partition Event Occurs
All four nodes (Two in AZ-A and two in AZ-B have the discovery.zen.minimum_master_nodes set to 3
Cluster does down
I reconfigure nodes in AZ-B to have a discovery.zen.minimum_master_nodes value of 2[/quote]
This makes sense

But at this point you have 4 nodes with a min master setting of 2. If you get another network partition at this point you will have a split cluster, as both sides think they have 2 master eligible nodes.

In order to handle split-brain scenarios, you can set up a small dedicated master node in one of the availability zones to act as a tiebreaker, while keeping minimum_master_nodes set to 3. This would cause the side with the tiebreaker to continue taking updates while the other side would be read-only in case of network partition between the AZs. If you have a full AZ go down, you then have a 50% chance the cluster will continue to work without manual intervention. If the side containing the tiebreaker went down, you would however need to identify this is the case (and eliminate possible network partition) and take manual action, e.g. bring up a new tiebreaker node.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.