Avoiding the Split Brain

Hi !
I have a cluster with two nodes.
Can I avoid split brain if I always send create/update/delete requests to one of this nodes ?
(I know there are other solutions)

I don't think that would help as if they lost connectivity then you would still have the issue that each decided they were the master of their own single node cluster the one you update would be different to the one you didn't so they would still have 2 different views of the same indexes when they came back together.

If they live on the same swtich then the only way they get partitioned is if the switch dies in which case you can't talk to it anyway.

the only way I can see you avoiding split brain though is to set

discovery.zen.minimum_master_nodes: 2
in
elasticsearch.yml

that way unless both are online they won't do anything, the down side of that is if one node dies you lost the cluster regardless but youcan then always set the value to 1 and restart the node that is still alive.

2 Likes

+1 to the above but also, elasticsearch does request routing on your behalf, so you don't need to think about where you send the create/update/delete req to. It will be routed to the correct index and its shards as per your commands.

Thank you

The problem is not that a server goes down completely. but when the servers can not communicate with each other. Both can be master.

If I add discovery.zen.ping_timeout to 60s, the chance of Split Brain will be very low. If they can't communicate for more than 60 seconds, probably one has completely gone down.

If you only have two nodes in your cluster I think a better solution would be to elect one of them to always be master by setting node.master: false in the in elasticsearch.yml for the other node. That's the only way you can guarantee no split brains in your 2-node cluster.

1 Like

Thank for your answer Bernt_Rostad.
In that case I lose high availability advantage. If Master node goes down then nothing will work.

My nodes live in the same network. If they do not communicate with each other for more than 1 minute then there is a high probability that one has gone down. And the second node can take over and become a master.

With just 2 nodes you can not achieve true high availability. If one node goes down or the two nodes loses connectivity, the cluster should not be able to elect a master as a node can not determine whether the other node is down, doing long GC or is just disconnected. I would recommend adding a third small dedicated master node so you can reach a majority even if one of the nodes go down or is disconnected.

1 Like

I understand that 3 nodes is best solution. But if the master node is so busy that it can't communicate during (1-2) minutes with other nodes then even 3 nodes will not help to have high availability.

For our search solution it is ok that the system does not respond for 1-2 minutes. But we want if master goes down after 1-2 minutes a backup system take control (second node).

If the current master node is not able to respond, the remaining nodes can automatically elect another master which will make the cluster available again, so it does help.

You can not have this happen automatically with just 2 nodes without risking a split brain scenario. This would require manual intervention.

If the current master node is not able to respond, the remaining nodes can automatically elect another master which will make the cluster available again, so it does help.

What I do not understand is way it's ok two nodes elect another master but not only one node. What will happen if he remaining nodes automatically elect another master and our lost master come back? Now we have again two master nodes.

In Elasticsearch it requires a majority of master-eligible nodes to elect a master. This means that 2 nodes are required to be present in both cases. If the master goes away or is partitioned off in a three node cluster, it will no longer be a master as it does not have a majority of nodes behind him. The two nodes that are still up and in contact can form a majority and elect a new master. When the former master node comes back it will join as a non-master node.

If you only have two nodes, no master can be elected as long as both nodes are not present. You can still read, but not write in order to prevent data loss.

Thank you . Now I understand how it works.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.