I will have around 20Gigs of logs I need to process on daily basis and I would like to keep this data for atleast a month. If the data I want to keep in elastic search is around 150Gigs, I am wondering if elasticsearch will scale up to handle this amount data. If yes, how many nodes do I need in the cluster.
150GB is fine, as is 20GB * 31 days = 620GB. You'll probably want to use a replica (1.2TB now) and the logs might take up more space when they are indexed (1.6TB). All of which is fine.
Generally the minimum suggested deployment for nice stability is 3 nodes. You could certainly think about 3 nodes with 2 350GB SSDs raid 0ed. Or more nodes with smaller disks. You can also go with spinning disks if you think you won't query your old data as much. That is called hot/warm architecture.
Best thing is to experiment with your data though. Load chunks of it on a development node and see how the numbers I predicted work out. Also look at ingest rate - you'll need to be able to index your 20GB per day.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.