HEAP Sizing on Data Nodes

Conventional wisdom states that 50% of RAM should be allocated to the JVM. However, where the bulk of the work done by the cluster is indexing, such as the logging use case, doesn't that justify increasing HEAP allocation above 50% to help with indexing?


A larger heap does not necessarily improve indexing throughput. As outlined in this blog post you should look to make it as small as possible while ensuring that you do not suffer long or extensive GC once the node fills up or when you query your data. Elasticsearch also uses off-heap data, so I would not recommend going above 50% of available RAM.

When you say "off-heap data" are you referring to FS buffers are other off-heap memory requirements?

It depends a bit on which version you're using and how your system is configured, but a major user of off-heap memory allocation is for network communication, and indexing involves quite a lot of this. Running out of off-heap memory is pretty bad (on Linux, it can lead to the process being killed by the OOM killer rather than reclaiming some of it via GC).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.