I'm managing multiple ES Clusters. Each cluster consists of
3 master dedicated nodes
multiple data nodes
To prevent from occurring "Split Brain Problem", I've set discovery.zen.minimum_master_nodes: 2.
Last week, one of master node died, but entire ES cluster worked properly without any problem. So far so good!
But, if another master went down during recovering a dead server, my cluster would fallen to READ-ONLY mode which is a big problem because my realtime data couldn't be saved. My boss is considering setting discovery.zen.minimum_master_nodes: 1 to prevent this situation.
So I'm wondering what is the best practice to set up master nodes to achieve H/A. Wanna to know how other engineers set up master nodes.
Which potentially opens you to split brain, this is Bad.
Having 3 masters is the best idea.
You may be thinking - Well what if I had 5, then you could lose 2 and we'd still be ok! I guarantee someone will then ask, "but what if lose 3 masters!" and we start that circular reasoning again and so they'll ask to move to 7. By that point the business will be asking "why are we paying for all this under utilised infrastructure?".
At that point, the risk of losing more those N masters is less than the cost of having 3+N masters to try to beat that what-if risk.
If you are worried about this, then the best way to negate the risk is to have the ability to very quickly spin up a new master to replace the lost one, aka automation, which you likely already have now. So you still have a risk, but you are balancing that with the ability to get back to green ASAP.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.