25TB is quite a lot of a data node to have. For something like this volume I recommend a hot/warm architecture with a couple of hot data nodes for the last few days with nice SSDs and then some warm nodes with bigger disks to make up the balance of the space. Queries for the past few days should be quick because they stay on the fast SSDs and the warm nodes will be slower but still searchable reasonably quickly. They won't have nearly the disk cache hit rate because of their higher disk to ram ratio but that is life.
Thanks for the response Nik, Make absolute sense to me . For now we will be using these nodes only for storage . May be in future we will introduce Kibana as well for searching and visualization. Then I guess i will recommend more nodes with hot warm architecture + some additional master nodes as well . I guess to start with 3 nodes are fine ? my reason to post this question is on the replica shard , I mean do I need to double the space because of 1 Replica ? Like i mentioned
100 GB /Day x 365(days) x 2 (1 primary + 1 replica) = 73 TB
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.