I am having requirement where multiple hosts ( count would be approx 120) sending data via filebeat. Amount of data would be more during business hours and less during night. Total avg data ingested is 35 GB per day ( 1.5 million records per day) .
Peak load is 200 MB data from few servers within 5 minutes.
Setup that I am thinking is ( setting up in AWS)
Filebeat > logstash > ElasticSearch
Logstash will have two instances ( r5a.xlarge i.e. 4 CPU , 32 GB RAM)
Elasticsearch will have 4 nodes (m5.xlarge.elasticsearch i.e. 4CPU, 16GB RAM) and 750GB EBS volume attached to each instance.
My requirement is to have data as near as possible to real time. Is this configuration good enough or need to bring in solution like REDIS for caching or use more servers in logstash/ES.