As a rule of thumb, does it make sense to set a Logstash heap size equal (or proportional) to the Elasticsearch heap size, or there are other factors to take into consideration?
For instance, would an ELK stack installation where the Logstash heap size is 1/5 of the ES heap size be recommended?
There's no correlation between the two.
Logstash should really only need a few GB at most, just enough to deal with messages it holds in its internal queue while processing.
There is much more that affects Logstash's use of JVM heap than just its internal queue. If you are doing any of the following you will increase your need for JVM heap...
starting multiple pipelines (in 6.x and later)
increasing input buffers and workers to better handle bursts of incoming data without packet loss (a MUST-HAVE when using any UDP-based input)
doing a lot of enrichment and using the caching capabilities of these filters for increased performance (DNS, GeoIP, and others)
using the translate filter to make data more human-friendly with lots of, or even a few large, dictionaries
If you are doing ALL of the above it is even possible that the JVM heap requirements of Logstash might even exceed those of Elasticsearch. Just yesterday we had to bump up a Logstash instance from 6GB to 8GB as it was exhausting the JVM heap. It is a small price to pay for producing truly high-quality data at very high event rates.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.