I need to store logs generated by say 50 apps and the average log size a day is 5 GB with a retention period of a month, the write speed isn't important for me but the read time is, high availability isn't also that essantial.
so what do you guys think about the right cluster configuration for this case?
Total Data for one month retention. = 5 * 31 = 155 GB
Total Storage (GB) = 155+ (155 * 5% Margin of error ) = 163 GB
Total Data Nodes = ROUNDUP( 163 / Memory per data node / Memory:data ratio) + 1 Data node for failover capacity.
So a 3-node cluster to provide a reliable solution. 3 master data nodes with a 1 shard / index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.