JVM Heap Advice from documentation

Hello,

In the Setting the heap size documentation page, the advice for setting the heap is as follows:

  1. less than 50% of physical ram because of kernel caches
  2. less than 32 or even 26 GB because of Compressed OOPS

However, the (1) is an odd recommendation: if ES is the primary application on the system, then surely, it would benefit from larger heap so it can keep data in its own cache as opposed to rely on kernel caching disk access? Also, why 50%; is it just an old wive's tale or there is some science behind it?

Elasticsearch mostly doesn't have its "own cache" for on-disk data. Modern operating systems do a very good job of this already, so it makes a good deal of sense to rely on that.

Perhaps a poor choice of words: the phrase "old wives' tale" kinda excludes from this discussion those of us who are old wives.

In this case there is indeed science behind this guideline. Elasticsearch heavily uses off-heap data, and AIUI the limit on off-heap data is by default equal to the heap size, so to allow all of the memory it needs to be allocated in RAM you should set the heap size to no more than half of the total RAM.

OK, so why not ask the reverse question - if heap usage isn't that important to ES beyond certain memory point (say, ability to hold search result set in memory before transmission), why not allocate it less memory, say 25%? Wouldn't that improve performance as now more memory is available for kernel disk cache and off-heap data.

Note that the documentation gives 50% as a recommended upper limit, not a target:

Set Xmx to no more than 50% of your physical RAM

You can of course set it lower. I think that reducing the heap size also reduces the space available for off-heap data (they share a limit) but you are right that it increases the space available for filesystem cache. Would that improve performance? It depends™ :slight_smile: It means a full GC would be faster, but maybe more frequent, and means it can cache less data internally(*). With some workloads this could be a net win. Only proper benchmarking can tell.

(*) Elasticsearch relies on the filesystem cache for fast access to on-disk data but it does have its own caches for other data that isn't kept on-disk.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.