I have 5 node cluster. 1 dedicated master node ( 16 GB/4 core ) and 4 data nodes (32 GB/8 cores/ 5 TB SSD )on each data node . I am having around 10 TB of data to store in one index(as of now not doing time base indexing cause of my use case) on Elasticsearch.
Please help me to figure out to how many shards should I keep for faster query.
The first thing I would like to point out, even though it is not related to what you are asking, is that you always should aim to run at least 3 master eligible nodes in order to avoid s single point of failure.
When it comes to shard size and count and the impact on query performance this heavily depends on your data, mappings, types of queries and how many concurrent queries you need to support. It is hard to give a general answer do I would recommend running some benchmarks to find out.
yes... that we have kept 2 master eligible master node as you have mentioned.
Now, comes to query part, mostly I have count queries, I just wanted to know, is that fine if I keep 200-300 number of shards having 30 GB each in the cluster as per my given configuration.
You need at least 3 master eligible nodes in a cluster for high availability. 2 is not enough.
That gives around 2.5TB per node, which should be possible. The amount of data stored per node is generally limited by query performance or heap usage. Have a look at this webinar for more details.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.