Max data that can be stored based on memory configured

Hi,

I was going through the blog which talks about shards in a cluster https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

In one of their tips they say " A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better."
So even if i assume the average size of a shards is 20GB . Can a node with 30GB heap can store 600(shards) x 20GB = 12 TB data without issues ?
Please help me understand this.

Thanks,

A node may hold more or less than that depending on the data, mappings and workload. These are all rough guidelines around best practices, and I created this blog post as I saw a lot of users ending up in serious trouble due to having large volumes of small shards, which for many use cases is very inefficient.

Thanks for the reply.

I was looking for dimensioning guidance for setting up a Elasticsearch Cluster( no of nodes, memory, disk etc) for say x no of indices created with size yGB, retention period of indices etc.
So any guidance on that would be very helpful.

Thanks & Regards,

It will depend on the use case, but assuming logs and/or metrics the following resources may help:

Thank you very much :slight_smile:

Hi, i have a small clarification. In the link https://www.elastic.co/blog/sizing-hot-warm-architectures-for-logging-and-metrics-in-the-elasticsearch-service-on-elastic-cloud , the Disk:RAM ratio is 30:1 while the webinar on Quantitative Cluster Sizing suggests the ratio as 16:1. Is this difference because of the version of ES used or is there anything else.

Thanks,

This will depend a lot on the use case and how much you index per day compared to the retention period on the nodes. The webinar is quite old and that also plays a part to some extent.