Data node calculation based on "Number of shards per node below 20 per GB heap it has configured"

I have the following scenario where,
Data volume of an index is 85GB/day with a daily index rotation and retention of 90 days.
Similarly I have 14 other indices with the same data volume per day.

I have a system which has 64GB ram and 12 Core of CPU. Where I have the heap configured to 28GB. (considering the allocation is <50% available RAM)

Based on the below statement, 28GB*20 = 560 shards can be allocated per node. But I am not sure whether it applicable for only primary shards or this include replica as well.

A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better.

And the number of data nodes calculated as ,

  • Sum of the whole index size for the retention = 85GB90days14 sources = 107100GB
  • No. of primary shards (considering max. size of the shard limits to 50GB) = 107100/50 = 2142 shards.
  • Total no. of shards considering a replica for each = 2142 primary shards * 2 = 4284
  • No. of data node needed considering only primary shards = 2142/560 = approx. 4 data nodes.
  • considering primary & replica shards, total no. of data node is 4284/560 = approx. 8 date nodes.

Can I consider 4 data nodes over 8 data nodes due to capacity constraints.

I assume replica shards mostly in passive state and may not consume the resource. Please correct me if I am wrong.

Replicas also takes up resources so are included in the count.

No, this is not correct.

It is important to note that the recommendations will depend a lot on which version of the stack you are using. In the most recent version a lot of improvements have been made that alter the calculation, which is noted in the blog post you referenced.

If you are on an older version it is important to note that the blog post recommends a maximum number of shards. The reason for this is that a lot of users end up with a huge number of small shards which is very inefficient. Note that the blog post does not suggest that a node will be able to handle the maximum recommended number of shards at any particular size.

In older versions you are very likely to start experiencing heap pressure before reaching 20 50GB shards per GB of heap.

If we assume you have 15 streams ingesting 85GB of data per day and that you keep this for 90 days and have a replica configured to get some resiliency. If the data takes up the same size on disk once indexed (simple assumption) you end up with around 230TB of data.

If you as in your calculation are using 8 data nodes, each of these need to hold almost 29TB of indices. Given that the nodes will need to serve queries as well as index data I think this is wildly unrealistic. I would expect you to need considerably more data nodes. I would recommend having a look at the following resources for further information:

1 Like

Thanks @Christian_Dahlqvist for your time and the benchmarking document.
I agree for the consideration of primary and replica shard for the per heap GB calculation. It's a miss from me.

ES version I use is 7.17.

My initial calculation was considering the max. shard possible for node is in my case 560 (28GB heap) and max. shard size 50GB, I know that's extreme, all put-up for the calculation.

I assumed all those data (per GB restriction and max. shard size, captured from elastic.co) mostly by considering the query load and indexing data may be much more. Also I left 50% of RAM for OS caching.

But I am bit confused with the calculation in the mentioned blog.

The typical memory:data ratio for the hot zone used is 1:30 and for the warm zone is 1:160.

1:30 memory disk ratio for hot data node limits max. 2TB disk per node,

If I go with my previous calculation considering 40GB of single shard size.
230TB/40 = 5750 total shards.
5750/500 shards per node = 11.5 ~ 12 data nodes. and 560*40GB = 20TB far higher than 2TB size.

But considering hot and warm node architecture and memory-disk ratio,

Hot zone = 7 days.
Warm zone = 83 days.

Hot zone
Total data (GB) = 85 * 15 * 7 * (1+1) = 17850GB
Total Storage (GB) = 17850 * 1.25 =22313GB
Total Data Nodes = ROUNDUP(22313/64/30)+1 = 13 nodes.

Warm zone
Total data (GB) = 85 * 15 * 83 * (1+1) = 211650GB
Total Storage (GB) = 211650 * 1.25 =264563GB
Total Data Nodes = ROUNDUP(264563/64/160)+1 = 27 nodes.

Total of 40 data nodes.

  • But based on this calculation either a single shard size or shard per GB is presumably low.
  • Hope I can still decide the no. of shards based on the current day's index size.,
    e.g., consider 40GB per shard, 85GB index size => 85/40 = 3 + 1 replica for each shard.
  • I am now fallen into two of the above approach, could you please suggest on this.

My stack is basically ingest logs from various sources and access from Kibana for tracing and tracking. I do keep a couple of dedicated coordinating nodes and 3 master nodes. All runs as on-Prem VMs.

Found the similar thread Clarification about recommended memory-disk ratio of 1:30 - #2 by DavidTurner

The amount of data a specific type of node can handle is generally driven by either the amount of heap available and/or performance requirements.

Every shard stored on a node uses up a certain amount of heap that is somewhat proportional to the size. This can vary a lot based on e.g. mappings, settings and merging status as well as Elasticsearch version (newer versions are more efficient). All heap is however not available for shard overhead as heap also is required for indexing, querying, management and other internal tasks.

In a hot-warm architecture all indexing is generally performed against the most recent indices located on the hot nodes, which means that warm nodes generally do not perform any indexing. It is also often assumed that the most recent data (held on hot nodes) is queried most frequently.

As hot nodes perform all indexing and a significant portion of querying they need a large portion of heap set aside for this. The fact that they perform all indexing also means that they require really fast storage, which is why they often hold a limited amount of data like the exxample in the blog post.

A warm node serves less queries and do not index, so can set aside a lot more heap for stored data and therefore hold more data.

Be aware that the more data a node holds the longer it may take to query the data. If you have strict query latency requirements the limit to how much a data node can hold may be limited by this rather than heap.

Thanks @Christian_Dahlqvist

Christian, you said this twice in this thread. are you talking about 7.17 vs 8.x ?

Yes. I would recommend using the latest Elasticsearch 8.x if you are just starting out.

1 Like

no I am not starting out. but getting bigger by day. hence thinking ahead.