You should always aim to have 3 master-eligible nodes in a cluster so should set this up as onde dedicated master node and two master/data nodes.
The ideal shard size and shard count will depend on the data, queries as well as the number of concurrent queries (assuming this is a query-heavy use case). I would recommend running some tests/benchmarks to determine the optimal configuration for your use case. Have a look atthis Elastic{ON} talk for details.
This is also very closely tied to the data, queries and expected load so is best answered through benchmarking.
Thanks for the quick response. I'm sorry I was not clear on describing my topology. We have 3 master eligible nodes but only two of them are data nodes. The 3rd master only node will be only for the purpose of electing a new master and will be piggybacking on a different server.
Will it be a good bench mark to shrink the size of data and the resources in the cluster and do the benchmark and sort of like extrapolate the results. Or do we need to do benchmarking with the actual volume of data.
I saw on the internet that ideal shard size should be 20 - 40 GB is this true?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.