In one of their tips they say " A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better."
So even if i assume the average size of a shards is 20GB . Can a node with 30GB heap can store 600(shards) x 20GB = 12 TB data without issues ?
Please help me understand this.
A node may hold more or less than that depending on the data, mappings and workload. These are all rough guidelines around best practices, and I created this blog post as I saw a lot of users ending up in serious trouble due to having large volumes of small shards, which for many use cases is very inefficient.
I was looking for dimensioning guidance for setting up a Elasticsearch Cluster( no of nodes, memory, disk etc) for say x no of indices created with size yGB, retention period of indices etc.
So any guidance on that would be very helpful.
This will depend a lot on the use case and how much you index per day compared to the retention period on the nodes. The webinar is quite old and that also plays a part to some extent.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.