I am having requirement where multiple hosts ( count would be approx 120) sending data via filebeat. Amount of data would be more during business hours and less during night. Total avg data ingested is 35 GB per day ( 1.5 million records per day) .
Peak load is 200 MB data from few servers within 5 minutes.
Setup that I am thinking is ( setting up in AWS)
Filebeat > logstash > ElasticSearch
Logstash will have two instances ( r5a.xlarge i.e. 4 CPU , 32 GB RAM)
Elasticsearch will have 4 nodes (m5.xlarge.elasticsearch i.e. 4CPU, 16GB RAM) and 750GB EBS volume attached to each instance.
My requirement is to have data as near as possible to real time. Is this configuration good enough or need to bring in solution like REDIS for caching or use more servers in logstash/ES.
There is no way for anyone to answer that. For one thing, you haven't said what your logstash pipelines are doing, or even why you are putting logstash between filebeat and Elasticsearch. The cost of pipelines varies enormously.
The only way to know the answer is to try it. Build a solution, measure its throughput, then scale the part of the solution that is a bottleneck.
I am having my jboss/wildfly logs which is spitting out multiline logs and is not fully qualifies json as there are other fields like timestamp , java threadname , class name etc. Logstash will be breaking those multiline logs in proper json and those fields will be used in creating visualization in Kibana.
I understand that it would be a hit and trial method to get to know the best suitable configuration but is there any baseline available for logstash or Elastic search.
No, there is not.
If you are doing this in the cloud it will only cost a few cents to set up a micro instance and get a feel for what its throughput is.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.