Scaled ELK architecture in AWS

I'm working on building a scaled out ELK setup for a production system in AWS with EC2 instances. I need it to handle about 175GB of logs a day with a possible fluctuation of 50GB give or take. On average i would estimate this is about 2750 event/s. The logs are coming from multiple app servers running on ec2 instances that have an aggresive log rotation during production hours. Incremented backup logs are created when the rotation occurs. I would use filebeat to read from the backup logs, but do to the nature of the app, information might be missed in the case of a failure.

My Plan is to use to filebeat to run the multi-line filter then ship the logs from the app instances. the logs would go to (2) logstash instances [m4.large = 2cpu/8Gb ram]. there are only a couple simple grok filters that need to run, one mainly for timestamp and another for a specific value. Then from there output this to (4) elasticsearch instances [m4.large] each with 500Gb ebs volume of standard ssd. the logs are only kept for a week so curator will be running as a cron to clear them out after that.

for elasticsearch im planning on (3) of the nodes to be master/data and (1) node just data. I would also probably run Kibana on one of the instances or perhaps just a seperate t2 instance since i dont think it uses much resources. and I'm thinking of running with the settings of 5 shards and 1 replica for ES. For the host discovery i was going to use "" which would include all 4 nodes.

Since the log rotation is so aggressive on the app instances I feel there is a risk of missing logs in the case that there is a pipeline backup on logstash. In which case I would add two redis instances in between filebeat and logstash but im going to try it without them first.

With (4) instances of ES with 500GB each and 1 replica enable, does that mean I technically have 1TB of storage since the other 1TB would be for the replicas?

Any recommendations on the setup or areas of concern?

Part of this is going to be deploying, seeing how it all performs and adjusting as required :slight_smile:
Looks good though!

Thanks for the input.

Also when it comes to shard replica's if i have 1 replica of every primary shard, does that mean it will use double the space?


