My code retrieves data from an external API, converts the response to JSON, and then iterates over it n times. The entire process takes approximately 30 minutes to complete, and the final output is sent to Elasticsearch. Everything functions correctly except for the garbage collector, which is not properly clearing memory. As a result, memory consumption continuously increases, eventually reaching over 30GB. To manage this issue, I have been regularly restarting Logstash to clear the memory.
Can anyone suggest an alternative solution for handling Logstash garbage collection?
If there is a memory leak then to find it you will have to take a heap dump (the default jvm.options file includes -XX:+HeapDumpOnOutOfMemoryError so this should happen automatically). I would recommend reducing the maximum heap to perhaps 8 GB so that you can deal with a smaller heap dump.
If you run a heap dump analyzer (like MAT) then when you open the dump and run the default analysis it should show you the #1 leak suspect front and centre on the first screen that is displayed.
What to do after that very much depends on what that leak suspect is.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.