Current recommendations for large heaps (64GB+)

Hey folks,

I'd like to get an opinion on current recommendations for running Elasticsearch with large heaps (64-128GB) please.

I understand ~30GB is generally recommended for most installs in order to benefit from compressed OOPs and that after crossing this threshold you would need to go above ~50GB to benefit.

I also understand that larger heaps can be more prone to longer garbage collection times. Is this still a concern in 2020 using the G1GC garbage collector?

For context, we are currently using Elasticsearch 7, 30GB heap size and the G1GC garbage collector. For the most part things are working well . We store data in daily indices with a single primary shard, usually 10-15GB in size,10+ million documents. There are intermittent problems caused by end users running heavy aggregations over historical data in Kibana, temporarily spiking the heap memory usage above the parent breaker limit, tripping circuit breakers (http request too large) and causing issues for other applications which query the cluster.

I understand we could amend breakers and configure things so that the heavy aggregations would be killed before affecting other clients, but ideally it would be best to allow the visualizations to run.

We are in a position to increase the heap memory of the nodes to somewhere in the region of 64 - 128GB and still leave at least 128GB RAM unused, free for filesystem cache.

Given that the G1GC collector is now considered stable , would there be any compelling reasons not to increase the heap ?

Cheers

We still don't recommend running super large heaps, even with G1GC. You are better off looking at splitting the memory into smaller nodes.

I haven't seen anyone using heaps this size myself, so I can't comment any further sorry!

Things are moving towards this ideal, with features like async search, and the release notes for the last three versions (7.7, 7.8, 7.9) indicate a pattern of work to reduce the memory footprint of aggregations too.

The official recommendation to keep compressed OOPs still stands, although we do encounter some folk who run with much larger heaps without major problems. Compressed pointers fit better into CPU caches so there may be a performance hit even if you have the RAM to handle it, and reducing your filesystem cache can also have bad performance consequences too.

1 Like

No problem at all, and thanks for the recommendation.

Thanks for the reply, I was not aware of the async search feature (for something similar, have used partitions to break up aggregations and retrieve results via the API for batch reporting, but unfortunately it doesn't seem doable via Kibana visualizations).

That's an interesting point about the CPU caches, I hadn't considered that aspect. Will have a think about establishing some baseline performance metrics and test the effects of increasing the heap on a single node before jumping into anything.

Thanks!

1 Like