I have the following scenario where,
Data volume of an index is 85GB/day with a daily index rotation and retention of 90 days.
Similarly I have 14 other indices with the same data volume per day.
I have a system which has 64GB ram and 12 Core of CPU. Where I have the heap configured to 28GB. (considering the allocation is <50% available RAM)
Based on the below statement, 28GB*20 = 560 shards can be allocated per node. But I am not sure whether it applicable for only primary shards or this include replica as well.
A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better.
And the number of data nodes calculated as ,
- Sum of the whole index size for the retention = 85GB90days14 sources = 107100GB
- No. of primary shards (considering max. size of the shard limits to 50GB) = 107100/50 = 2142 shards.
- Total no. of shards considering a replica for each = 2142 primary shards * 2 = 4284
- No. of data node needed considering only primary shards = 2142/560 = approx. 4 data nodes.
- considering primary & replica shards, total no. of data node is 4284/560 = approx. 8 date nodes.
Can I consider 4 data nodes over 8 data nodes due to capacity constraints.
I assume replica shards mostly in passive state and may not consume the resource. Please correct me if I am wrong.