I was wondering what would be the advantages and disadvantages of using a single cluster with 3 data/master nodes and one dedicated master and two data nodes?
Will there be any effect on Cluster performance, stability, JVM Heap and other aspects?
If you have one (or two) master-eligible nodes then your cluster is not resilient to the loss of a master-eligible node: if a master-eligible node fails, or you need to take it down for maintenance, then the whole cluster will become unavailable until the lost node is restored. If you have three or more master-eligible nodes then this is no longer true: the cluster can still elect a master and carry on working even if one of the nodes is lost.
Conversely, if the elected master is a data node then it must also perform searches and indexing which might mean it does not have the resources needed to properly do its duties as the elected master, which could hamper other operations in the cluster that require coordination with the master (index creation, mapping updates, etc).
I generally think the first point is more important and would generally recommend to prioritise having three master-eligible nodes first, even if some of them are mixed master/data nodes.
Thanks David for your quick response!
As per your suggestion, I would like to go with 3 master eligible nodes.
How about this approach, Since I have 3 node cluster, I would choose one node to be dedicated master and the other two nodes to be master eligible data nodes.
Do you see any drawbacks in this approach?
Hi @DavidTurner
In the similar context, when I have 3 node cluster.
Let's say, Node A is dedicated master, Node B and C are master eligible data nodes.
If by any chance, Node A goes down, Node B or C would become a master (Let's say Node B).
Once Node A comes back to join the cluster, Since now Node B is a master, Node A cannot be a master. This would mean Node A is of no use after this point as this is neither master nor can hold any data in it.
This would also mean, more load on Node B as it has to perform both master and data related tasks
Is my assumption of Node A being of no use correct?
No, there's no need for that, because each master-eligible node must be able to assume the burden of being the master at any time. If the mixed data/master nodes are running close to the limit then you should move to a configuration with three dedicated master nodes.
Being the master node takes a bit of extra resources (some heap, some CPU, some network bandwidth, etc). It's normally not a lot, but if you need to know precisely how much you will need to measure this in your own environment, since it depends very much on how your cluster is set up.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.