Need help in scaling up my elasticsearch-logstash-graylog setup

Hi All,

My task is to have a centralized log analysis tool that can accommodate 500 GB of log files; and can search for anything within seconds from it. So, had a basic setup of Graylog with elasticsearch and logstash.
To start, I tried reading one log file using logstash and stored it in elasticsearch. And am able to visualize them in the Graylog web interface.

Now i need to scale up the setup so that I can index 500 GB of data in elasticsearch.
Is it possible to read these many files through logstash and index to elasticsearch?
Will mongodb helpful in this scenario?
How should I scale up my architecture in terms of: elasticsearch nodes, RAM, CPU Power, ES-heap size and many more... so that I can meet the requirements of the task.

Currently have 4 GB RAM in my VM
CentOS 7
Elasticsearch: v 1.7.5
Logstash: v 2.2.2
Graylog: v 1.3

Kindly help me with my questions. I am very new to this environment.
Thanks.

Hi,

Yes we can Represent huge volumes of elasticsearch for sure.Only way we have to choose good good hardware. so these are depending on how many shards per index ,how many replicas for shard , is data will grow in future ...?

How many dedicated master nodes , how many data nodes we want to choose ..all these parameters will take care of you es cluster stability and search capability since you are representing huge data.

Thanks
phani

Thanks @Phani_Nadiminti for replying.

The data will grow and can reach upto 1TB.
As my focus is on high searching capability I need 2 replica shards and 15 primary shards.

With 1 master node and 2 data nodes, is it fine to deploy the setup (if all the nodes have 8 GB RAM)?
I am not sure about the number of nodes to use. Can you please suggest me. I don't want to get stuck in the future due to any of this.
Thanks a lot.