Hi,
In our organization, we've set up 3 nodes, two of them are storing the data, and the last one is a kind of 'Tiebreaker'.
When one of the node fails, Master is easily switched to the other machine, and everything seems to be working properly - and that it's okay!
The problem begins when two nodes fail, so let's say that for example it would be, tiebreaker(node-3) and node(node-2) which currently was not selected as a master. After that, only one node(node-1) left, which was previously chosen as the master one - Unfortunately, but this node that was left, also does not work anymore after failure of node-3 and-2. Why it is like that? If previously it was a master, why does node-1 stop working?
You always need a majority of master eligible nodes available so with three nodes you can only afford to lose one node. In order to be able to handle two nodes failing you would need at least 5 nodes.
The third node can not tell whether it has been separated from the other nodes due to a network issue or whether they have crashed. If both sides were able to continue taking writes you would have a split brain scenario and lose data. To ensure only one part of a cluster can take writes a strict majority of nodes need to be available, which is why the single node can not elect itself master and continue operating.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.