I have the following scenario where,

Data volume of an index is 85GB/day with a daily index rotation and retention of 90 days.

Similarly I have 14 other indices with the same data volume per day.

I have a system which has 64GB ram and 12 Core of CPU. Where I have the heap configured to 28GB. (considering the allocation is <50% available RAM)

Based on the below statement, 28GB*20 = 560 shards can be allocated per node. But I am not sure whether it applicable for only primary shards or this include replica as well.

A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better.

And the number of data nodes calculated as ,

- Sum of the whole index size for the retention = 85GB
*90days*14 sources = 107100GB - No. of primary shards (considering max. size of the shard limits to 50GB) = 107100/50 = 2142 shards.
- Total no. of shards considering a replica for each = 2142 primary shards * 2 = 4284
- No. of data node needed considering only primary shards = 2142/560 = approx. 4 data nodes.
- considering primary & replica shards, total no. of data node is 4284/560 = approx. 8 date nodes.

Can I consider 4 data nodes over 8 data nodes due to capacity constraints.

I assume replica shards mostly in passive state and may not consume the resource. Please correct me if I am wrong.