Is it ever ok to set ES's heap to > 50% RAM?


(Emily Wenberg) #1

I'm trying to learn some rules of thumb for when or if it's ever ok to increase min and max heap values for ES to > 50% of physical memory.

This page recommends setting min and max heap to the same value, and setting both to be > 50% of the available RAM on the box in order to let the OS cache things (I think for Lucene?): https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html

My team recently ran into a problem where sometimes a node would have long GC pauses but be unable to free anything from old gen. That would seem to imply that the node in question was still using that memory for something. (Or possibly some way in which we are using ES ran into a memory leak.)

The total memory use on the node was only about 60%, so it would seem reasonable to try increasing ES's heap by a few GB, but I'm currently a bit scared off by the fact that it's against general recommendations. The specific numbers that we are currently using are 32 GB ram, with min and max heap set to 16GB.

Does the 50% physical ram recommendation apply more to machines with smaller amounts of physical memory?

And related, does high ES heap but otherwise low system memory usage point to anything weird about what might have been happening in terms of why oldspace could not be freed up in this specific context?


(Christian Dahlqvist) #2

What is the full output of the cluster stats API? Do you have monitoring installed? If so, what does heap usage over time look like?


(Emily Wenberg) #3

I don't have an example of the cluster stats api output during one of the GC pauses. But I can still post the output of the command if you're looking for something general about the cluster if that would be useful.

We have monitoring based on the output of the cluster stats api, but it's not the official elastic monitoring. Here is a heap use graph during one of the times a node paused for about 12 minutes:

The max heap for that node is 16219242496 bytes, or 15.1 GiB, so it's not actually OOMing.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.