Can some one let me know how many shards each node can hold to prevent the under/over utilization of ES clusters having 3 Dedicated Data Nodes, 1 Dedicated Master Node & 1 Dedicated Client Node. The cluster is expected to hold logstash indices from 5 different services. An index is created for each day & holds one week worth of data. In short the cluster is expected to hold about 35 open indices at a time. Average size of each index will be about 500 MB. As per the doc it seems 500 MB is not a big size so it should not be good to go with default 5 shards per index.
I know that there is no magic formula for this & it depends on the nature of document, index & search query. What I am trying to ask is to get started do people follow any heuristics to determine the no. of shards per index given the no. of indices & average size of each index?
In order to keep the number of shards down, you ideally want your shards reasonably large. A shard size of a few GB up to tens of GB is not uncommon for logging use cases. In order to increase the shard size, you can adjust the time period each index covers, e.g. by using weekly or monthly indices instead of the default daily pattern, as well as the number of shards per index.
You probably should have 3 master nodes. You can have the current 3 dedicated data nodes as master + data. If each index is only 500MB. Then you probably can use 1 to 3 primary shards and 1 replica. If you are doing daily indices and see your data growing, then you can increase the number of shards on new indices and delete old indices based on your retention policy. Also if you don't really need daily indices you can have a weekly or monthly indices.
For each index I would not have more primary shards then data nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.