We are implementing an ELK stack to manage our logs. We have all kind of logs (Firewall, syslog, application logs, hardware devices, etc...). We are running logstash on Docker swarm.
The estimation of the throughput is around 300Go per day.
I was wondering how to size logstash when we'll be in production.
I see 2 approaches :
one big logstash with X Go ram . But how many ?
using one logstash container per type of logs
1 container to parse and index firewall logs
1 container to parse and index application logs
...
We are using logstash in two different manner (shipper and indexer).
When it comes to the RAM, have in mind that you have the oop's pointer compression problem: https://www.baeldung.com/jvm-compressed-oops and you should not give your JVM more than 32 GB of ram (how your machine determines the sweet spot for this you must check, but 31GB of ram per Logstash instance is the maximum you would like to give).
I don't have complex workflows so I don't need pipeline to pipeline. More, this is a beta feature.
The idea behind multiple logstash container is having multiple container that listen on a port (ex : 5000, 5001, 5002) and the VM hostname is the same (ex : logstash.local).
All containers output would send to the same output
The final flow would be something like : logstash (role shipper) OR beats => Kakfa => logstash (role indexer) => ES
Firewall logs go to logstash.local:5000 (1st container)
application logs go to logstash.local:5001 (2nd container)
etc...
Just on this point. I have a system with 96 gig RAM. I didn't know about this limit and assign 50gig ram to JVM and pretty much everyday system was hanging. I spend quite a time to debug java dump and after searching this forum found this limit.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.