Scaled ELK architecture in AWS

sila5 · March 31, 2017, 6:34pm

I'm working on building a scaled out ELK setup for a production system in AWS with EC2 instances. I need it to handle about 175GB of logs a day with a possible fluctuation of 50GB give or take. On average i would estimate this is about 2750 event/s. The logs are coming from multiple app servers running on ec2 instances that have an aggresive log rotation during production hours. Incremented backup logs are created when the rotation occurs. I would use filebeat to read from the backup logs, but do to the nature of the app, information might be missed in the case of a failure.

My Plan is to use to filebeat to run the multi-line filter then ship the logs from the app instances. the logs would go to (2) logstash instances [m4.large = 2cpu/8Gb ram]. there are only a couple simple grok filters that need to run, one mainly for timestamp and another for a specific value. Then from there output this to (4) elasticsearch instances [m4.large] each with 500Gb ebs volume of standard ssd. the logs are only kept for a week so curator will be running as a cron to clear them out after that.

for elasticsearch im planning on (3) of the nodes to be master/data and (1) node just data. I would also probably run Kibana on one of the instances or perhaps just a seperate t2 instance since i dont think it uses much resources. and I'm thinking of running with the settings of 5 shards and 1 replica for ES. For the host discovery i was going to use "discovery.zen.ping.unicast.hosts" which would include all 4 nodes.

Since the log rotation is so aggressive on the app instances I feel there is a risk of missing logs in the case that there is a pipeline backup on logstash. In which case I would add two redis instances in between filebeat and logstash but im going to try it without them first.

With (4) instances of ES with 500GB each and 1 replica enable, does that mean I technically have 1TB of storage since the other 1TB would be for the replicas?

Any recommendations on the setup or areas of concern?

warkolm · March 31, 2017, 10:32pm

Part of this is going to be deploying, seeing how it all performs and adjusting as required
Looks good though!

sila5 · April 3, 2017, 5:08pm

Thanks for the input.

Also when it comes to shard replica's if i have 1 replica of every primary shard, does that mean it will use double the space?

warkolm · April 3, 2017, 8:42pm

Yes.

system · May 1, 2017, 8:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scaling ELK stack for monitoring performance of my servers Elasticsearch	2	1154	September 9, 2017
Prod Architecture design document Elasticsearch	5	1574	October 12, 2017
Best practice - architecture feedback/opinion needed Elasticsearch	4	585	July 6, 2017
ELK architecture advice with S3 Elasticsearch	3	2106	January 16, 2017
Planning capacity and ELK using filebeat/logstash or logstash/redis? Logstash	2	1377	July 6, 2017

Scaled ELK architecture in AWS

Related topics