I have below architecture in place.
400-800 EC2 ---> AWS NLB ---> 10 Logstash EC2s ---> 14 EC2 of data node with filebeat i3.2xlarge (3 masters) 100GB SSD with 500IOPS for queue.
Every day in evening we serve huge traffic which generates 80K events per/sec on logstash nodes for 30 mins, at the same time I can see elasticsearch with indexing rate of 150K docs/sec.
In this time filebeat sends data for 3 different indices along with 4th index being indexed by AWS lambda.
Out of these 3 index on which filebeat sending data, I need 1 index which should give me data for almost realtime.
Can someone suggest me how can I achieve this on this big scale ?
Daily ingestion is around 1.5 - 3 TB
For that 30 min window logstash consume SSD disk for queue with 11-15GB utilization on almost all logstash nodes.