I have checked Elasticsearch docs and many blogs and this is what my understanding says .
Please correct me if this calculation is wrong
Daily Data volume :- 100 GB
Retention period :- 1 year
Total Disk space :- 100 x 365 x 2=73000 (73TB)
For such huge volume I am planing to have 3 Data nodes 25 TB Each.
Any suggestions here please
25TB is quite a lot of a data node to have. For something like this volume I recommend a hot/warm architecture with a couple of hot data nodes for the last few days with nice SSDs and then some warm nodes with bigger disks to make up the balance of the space. Queries for the past few days should be quick because they stay on the fast SSDs and the warm nodes will be slower but still searchable reasonably quickly. They won't have nearly the disk cache hit rate because of their higher disk to ram ratio but that is life.
Thanks for the response Nik, Make absolute sense to me . For now we will be using these nodes only for storage . May be in future we will introduce Kibana as well for searching and visualization. Then I guess i will recommend more nodes with hot warm architecture + some additional master nodes as well . I guess to start with 3 nodes are fine ? my reason to post this question is on the replica shard , I mean do I need to double the space because of 1 Replica ? Like i mentioned
100 GB /Day x 365(days) x 2 (1 primary + 1 replica) = 73 TB
If 100GB of raw data takes up 100GB on disk, that would be the case. That is however not necessarily the case as it is recommended that you optimise your mappings and index settings. If you are looking to store a lot of data per node, this blog post about sharding recommendations and this webinar about storage optimisation might be useful.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.