We're having some issue with one of our logstash clusters. There are 6 machines with 32 Gb of RAM + 16 Core AWS instances running logstash docker container. Those are only logstsh servers. We've had an stable machine use of ram for the past months, it stayed on around 17-18 Gb of RAM usage. You can see in the image below the behaviour how it changed lately:
We haven't changed the version, or did any change in the configuration.
What it's more strange is we have a parallel similar cluster with exact same machine and servers version, a bit lower load processing similar logs, but that one has remain stable.
The cluster hasn't had any major change in load neither. This has happened every 24-28 hours were we have to restart the logstash containers.
we're quite a big team, and there's quite a few changes, I cannot share the pipelines with you. I was wondering if it's a manual or any guide on how to debug the use of memory by logstash, maybe using some profiling could be good?
Not that I know, but Logstash is a java application, you could get a dump of the heap and try to analyze it, but I do not have experience with this.
Did anything change in the pipelines? For the behaviour to change like that I would expect a change on the load, not exactly need to be an increase on the event rate, but some changes on the messages could lead to change in performance depending on the filters used.
But without the pipelines it is not possible to have a hint of what could lead to this increase in memory usage.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.