When talking about nodes, the ES documentation says: "To ensure that your master node is stable and not under pressure, it is a good idea in a bigger cluster to split the roles between dedicated master-eligible nodes and dedicated data nodes." https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
Now the question is what means a "bigger cluster"; how to decide if your cluster is too big to share roles on same nodes and needs dedicated mater-eligible nodes ?
It is not necessarily just a question about the size of the cluster, but also the load it is under. If you have combined master/data nodes, all nodes need to have enough resources available to handle master duties. Once nodes start getting under pressure, e.g. due to increased data volumes or traffic spikes, resulting in longer GC, the risk of suffering from instability increases.
Clusters with 3-5 data nodes can often get away without dedicated master nodes unless load is heavy or unpredictable, but there is no hard limit to what constitutes a 'bigger' cluster.
Another consideration is the speed at which you are expanding the cluster. Changing the number of master eligible nodes in a cluster means that the minimum_master_nodeds setting also need to be updated every time. If you need to be able to easily scale the cluster up or down, this is generally a lot easier with a fixed number of dedicated master nodes.
Ok, so basically if I cannot predict/limit the traffic, the best way is to go with dedicated master nodes, right ?
Regarding the minimum_master_nodes updates, I only plan for upscale. Are there any technical difficulties in updating the parameter ? e.g. like handling the race between updating the param and adding a new node?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.