Optimal number of master nodes in a two racks cluster

alienmind · May 31, 2017, 11:06am

I have a ES cluster divided in two racks.

I wanted to have dedicated master-eligible nodes. I know the recommendation is having minimum_master_nodes set to N/2 + 1. I want to establish an adequate value for N. I've read also multiple recommendations on having N as an odd number but that's not an option if I have two racks.

What will happen if one of the racks is completely lost (or there's a network split)?

Scenario 1: Master is one node of the first rack. There's a network connectivity error between rack 1 and rack 2 or rack 2 completely crashes.

If I set N=4, minimum will be 3. I think (correct me if I'm wrong) that this will prevent split brain as the master election on the second rack will not have quorum (2 master eligible), however, I'm not sure if the nodes on the first rack will initiate a master election (there's a master alive, but there's no quorum).

If N=5 I would have 3 + 2 eligible masters. Minimum would still be 3 so same problems (basically only one of the two racks have the possibility to form a cluster. An odd number is not an option here.

If N=6 I would have 3 + 3, minimum would be 4 and there are the same problems... you get the idea.

Scenario 2: Master is one node of the first rack. There's a complete outage of rack 1.

The cluster will never be able to decide a master node because it will never have quorum.

What's the recommendation for N in a multiple-rack scenario? (having 3 racks maybe?)

Any ideas?

bleskes · June 2, 2017, 7:42am

Sadly, the number 2 is really not a friend of distributed systems, when it comes to availability. The system is either unsafe (subject to split brain) or goes down as soon as one of the masters goes down. Note that 2 is still good for data redundancy - i.e., you won't lose your configuration if one node dies.

All in all you have 2 options - either accept the above (or run with a single master, which is equivalent from an availability perspective) or use a 3rd rack to place a master on it. Note that you don't need to put data nodes on that third rack.

alienmind · June 5, 2017, 8:16am

Thanks for your response @bleskes

alienmind · June 15, 2017, 1:02pm

@bleskes If I go to the tie-breaker option:

What happens if I have two racks with very low latency and a third rack which for certain reasons has more latency, used as a tie breaker.

Is there a way force the tie-breaker to participate in the master election, but ensure that it never becomes the master? We are concerned that the higher latencies of this node can become a problem if this one becomes the master.

Thanks

bleskes · June 16, 2017, 12:20pm

Currently no. It is on our wish list, but no concrete plans yet (and there are bigger fish to catch first).

system · July 14, 2017, 12:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Odd Number of master nodes Elasticsearch	3	3362	May 21, 2020
Need more clarity and understanding on Split brain problem? Elasticsearch	8	11766	July 5, 2017
Splitbrain on a 4 node implementation Elasticsearch	7	623	February 19, 2018
Minimum number of nodes for availability: 2 or 3? Elasticsearch	2	362	July 6, 2017
Node config for split-brain avoidance Elasticsearch	3	402	November 2, 2019

Optimal number of master nodes in a two racks cluster

Related topics