Odd Number of master nodes

I was informed to use odd number of nodes to avoid split brain problem and also voting of master node is easy if the odd number of master nodes is configured.

However, say the cluster is configured with 5 nodes and one master node is down - then , there will be effectively 4 nodes ( again even number) - wont it be the same challenge with voting of master nodes ?

I'm not sure about the advice you heard, I hear 3 master eligible nodes as a recommendation most of the time. And that's what I use in my clusters, ranging from 6 to 20 data nodes.

The point about not getting a split brain is that you want to configure your master eligible nodes in such a manner that if there is a network outage they don't form two or more separate clusters. You prevent this by requiring a minimum number of master eligible nodes to be available for a master to be elected.

I'm still using ES6 and I know there are changes to how this is configured in ES7+, but the effect must still be the same - that you require at least half + 1 of the master eligible nodes to be available for a new master to be elected and a cluster to form.

Let's say I start out with 5 master eligible nodes in my cluster, then I must set the minimum to 3. Then, if I get a network outage some of the data and master eligible nodes will still be in connection with the elected master. If there are at least 3 master eligible nodes in this part of the cluster, it will still work. And because there are at least 3 in this cluster, the remaining nodes that are not in contact will only have at most 2 master eligible nodes on their side of the network break. And that is 1 too few to elect a new master. Hence they can't form a cluster and will not create a split brain problem. And, vice versa, if the current master only sees one other master eligible node, it is below the minimum and will shutdown itself and this part of the cluster - in case the nodes that it no longer can communicate with has formed a new cluster (since they may have the remaining three master eligible nodes).

I hope that answered your question.

It depends what you mean by "split brain" -- we normally take this to mean that the system splits into two parts that make inconsistent decisions, and when correctly configured Elasticsearch doesn't ever suffer from this. If you have 4 master-eligible nodes then voting requires at least 3 of them to agree, so there's no way you can split it into two parts without one of those parts stopping working.

An alternative meaning of the term "split brain" is that a network partition can stop the whole system from working until the network is fixed. If you have 4 master nodes then a 2+2 split would leave neither half with enough nodes to vote. There's no solution to this in general, even with 5 nodes you can experience a 2+2+1 partition which will be completely unavailable.

Elasticsearch does take some best-effort precautions to avoid becoming unavailable with a 2+2 split but they're not something on which you should rely.

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.