In any cluster you generally want to have 3 master-eligible nodes (with minimum_master_nodes set to 2 to avoid split brain scenarios). This will allow the cluster to stay responsive even if it loses one node.
If you have a dedicated master node, which generally is less powerful that other nodes, which should be left to just managing the cluster and not serve traffic.
Whether coordinating-only node help or not depends on the use-case and workload.
If you have a single coordinating-only node that you send all requests to, this becomes a single point of failure, so 2 is generally better (if you need them at all).
It is important to have 3 master-eligible nodes, so unless you have 3 dedicated master node this is generally a good idea.
Often this is fine, but it does as I mentioned earlier depend on the use-case.
Coordinating-only nodes if you have them in the cluster, otherwise data nodes.
With your inputs we have a plan:
1 Master node
2 Coordinating nodes (eligible masters)
2 Data nodes (One Replica)(RAID-0)
Avg doc size = 4Kb
Total docs = 250million
One Index.
Currently optimal 6 shards for our usecase.
We use routing while indexing and searching.
Few more Queries
Can we increase search throughput by adding more data nodes and more replicas ?
Would this infrastructure cause impact on indexing throughput ?
Thanks again. One last thing i was wondering about.
In past we faced OOMemory from kernel which killed java process on master node and we had to allocate more RAM for OS to continue operations. This was when we were directing requests to master.
Query: Why was my OS memory of master node being utilized, when the master is not really using LUCENE operations to cache segments. All my data operations are made on data node. Master node is just scattering and gathering docs ?
Our Master node was 8core 32bit machine.
Alloted heap space was recommended 50% i.e. 16Gb.
The space available for OS was the remaining 16Gb.
We were serving 3000 ES requests per second, thats about 50Mb of docs per sec.
The kernel OOMemory killer killed our master node java process and cluster was down. Note: Java was not out of heap. OS did not get memory and it killed java to free up space.
Our solution:
We have put upgraded existing machine 16core and 64Gb.
Allotted heap space now is 16Gb and OS gets remaining 48Gb to play.
Now things are good.
Query Again:
Does master do any kind of LUCENE related operations ? AFAIK it just does QUERY and FETCH.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.