Splitbrain on a 4 node implementation


(Edmund Cheng) #1

Hi,
I have done some reading about the splitbrain scenario, but seems to have hit a dilemma.
e.g for this structure:

machine1: 2 ES nodes (with rack id 1)
machine2: 2 ES nodes (with rack id 2)

In the above case, all the settings are default, except for awareness, which i wanted the pri/replica shards to be distributed across racks.
So if there is a network issue between the 2 machines, I will have a split brain scenario since min. master is 1.

  1. I can set min. master to 3 to avoid split brain:
    • but ES will be inoperable if network issue happens or even
    • if 1 machine goes down, the ES is also inoperable, thus losing HA
  2. if I set min.master to anything less than 3,
    • I will get split brain if there is network issue between the 2 machines
    • but if 1 machine goes down, I can still have 1 machine which is working.(HA)

I am not sure this is a right design or I am missing some better configs.
Any advise will be great!


(Christian Dahlqvist) #2

You should set minimum_master_nodes to 3, as this will allow 1 node to go down. In order to build a HA cluster where a machine can be allowed to fail while still allow the cluster to be fully operational, a minimum of 3 machines is required (this can however be much smaller as it generally only need to host a single, dedicated master node).


(Eason Lau) #3

FYI. https://www.elastic.co/guide/en/elasticsearch/reference/6.0/modules-discovery-zen.html#master-election

# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3

total number of master-eligible nodes mean all es nodes have below config.

node.master: true


(Edmund Cheng) #4

Thanks Christian,

"You should set minimum_master_nodes to 3, as this will allow 1 node to go down. "

Yes this is the best it can go i guess, because if one more node goes down, the ES will be inoperable since min.master of 3 is not met.

"In order to build a HA cluster where a machine can be allowed to fail while still allow the cluster to be fully operational, a minimum of 3 machines is required (this can however be much smaller as it generally only need to host a single, dedicated master node)."

So basically, for these 2 machines with 2 nodes, its not really an ideal implementation if we need HA and avoid split brain at the same time? what do you mean by (much smaller)?


(Edmund Cheng) #5

Hi, yes i understand that variable, but I will lose the ES if one machine goes down since min.master is not met. Avoided split brain, but causes the ES to stop working.


(Christian Dahlqvist) #6

Correct. For a small cluster with a reasonably sized cluster state, 1-2 CPU cores and 2-4GB of heap for a dedicated master node is often enough.


(Edmund Cheng) #7

hi, I see.. thanks..


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.