Why should we set -Xms and -Xmx to be equal to each other for Elasticsearch?

We are currently on version 6.4 of Elasticsearch, and the documentation here recommends to set max and min heap size to be equal to each other. I'd like to understand what's the reasoning behind this.

One behavior we are noticing is that if the heap size is somewhere close to 26gb or 30gb, it can take up to a few minutes for the cluster to (re)start because the JVM needs to allocate the entire heap size right at the beginning. Hence, we want to know what the implications are if we set the -Xms setting to something like 50% of -Xmx.

I'm also very curious about this. It seems to defeat the purpose of memory scalability that the JVM provides which I guess can provide more consistent performance or behaviour. I'm not sure but would be very curious for an elastic engineer to reply to this.

That's very common in production for any related Java Application.
You want to allocate the whole memory from the start.

Hi @imrimt ,

up to a few minutes

how many nodes is that for? If it is for just one node, I would suspect that you were memory pressured on the node. The OS may have to flush out data to disk to make room for the JVM. It is normally advisable to have 64GB RAM to use such heap sizes, since you also need plenty RAM for file system cache.

Setting min and max to the same value in combination with -XX:+AlwaysPreTouch ensures that you take any penalty in memory allocation during node startup and not during node operation. Otherwise you risk getting a hit during search or indexing in case the JVM decides to expand its memory. Also, allowing the JVM to shrink heap by setting min != max you risk seeing this hit regularly.

1 Like

@HenningAndersen Thanks for the detailed answer. Yes, we are running on a 104GB machine with only one node, and the OS has around 17GB headroom (there are other processes running as well). So it's understandable that it can take sometime to allocate 26-30GB for the JVM.

What I'm more interested about is also addressed in your second point. Thanks!

you might want to see if you have transparent huge pages enabled, on RHEL6 this causes big startup delays and weve disabled it