OOM since 8.16.1 with openjdk23

ALIT · January 9, 2025, 2:50pm

Yep, as mentioned here, I can share it.
@Ignacio_Vera: is it safe to post the file as gist? The file has 73MB and as far as I can see, the only "private data" is the hostname. Anything I should consider? As its our company ES, I would not like to share any confidential data.

Ignacio_Vera · January 10, 2025, 11:48am

It should be safe. We only need the initial few hundred lines of the file. If you feel more comfortable you can share it in a private message.

Ignacio_Vera · January 13, 2025, 1:31pm

We just uncover a bug on the enrich cache introduced in 8.16 where the cache will slowly grow until it exhaust the JVM heap. If you are using enrich and facing OOM, the way to avoid it is to set the setting enrich.cache_size back to the previous value of 1000.

github.com/elastic/elasticsearch

Enrich cache default can usage a ton of memory

opened 12:17PM - 12 Jan 25 UTC

nik9000

>bug :Data Management/Ingest Node Team:Data Management v8.16.0 v8.17.0

### Elasticsearch Version 8.16+ ### Installed Plugins _No response_ ### Java… Version _bundled_ ### OS Version any ### Problem Description In 8.16 we changed the default for `enrich.cache_size` from `1000` to `1%` which should allow us to more carefully cache. The trouble is that we don't account for a few things in the cache's weight: 1. The cache key. I'm seeing those in the 600 bytes size. 2. Empty results. 3. Results use their serialized size but are then stored as `Map<String, Object>` which is usually much larger. 4. It sure *looks* like there isn't any intrinsic cost to a cache entry. Like, for the entry itself. When you combine the first two, queries pointing to no documents in the enrich index cost 0 bytes. So, no matter the limit you configure, you *can* fill up memory with them. But, yeah, I *think* we should include the size of the cache key. And Include an estimate of the cost of the List we store in the cache - even if it's empty. And, finally, I think we should uplift the cost of the exploded map. Or we should cache the serialized copy. I'm not entirely sure if it's worth a cost for the cache entry itself. That bit I don't know. ### Steps to Reproduce See above. ### Logs (if relevant) @nik9000 has a heap dump of a node the OOMed full of cache entries of empty results lists.

RainTown · January 16, 2025, 10:29am

Great that a bug was found,

A big THANKS to @ALIT and @Evesy for helping uncover it, with not a little troubleshooting effort too.

Evesy · January 16, 2025, 12:29pm

Thanks @Ignacio_Vera

I will look at reverting this value back to 1000 (though I'm not clear on where these settings are actually set (Set up an enrich processor | Elasticsearch Guide [8.17] | Elastic) and then dropping the mmap limit back down too.

Is it expected that this would cause issues in our cluster given that we do not make use of any enrichment processors?

Ignacio_Vera · January 16, 2025, 12:41pm

No, if you are not using the enrich processor, you should not b affected. I suspect this is something else

Evesy · January 20, 2025, 8:43am

@Ignacio_Vera I have the crash log from one of our nodes after lowering the limit back to ~200k, this is on a cluster not using ingest.

It appears too large for either Gists and this forum doesn't support uploads of that file type, is there a way I can get it over to you/the Elastic team? Happy to trim it down if you can advise exactly which bits of it are important

Thanks in advance!

pavlodvornikov · March 27, 2025, 4:07pm

Hello, it looks like I'm facing the same issue that I described in Elasticsearch 8.17.2: Native memory allocation (mmap) failed

I checked wc -l /proc/x/maps on old and upgraded
clusters which are loaded more or less the same and the difference is quite big. Old reports ~20k while new reports ~160k

Were there any recent developments on this topic?

Ignacio_Vera · March 28, 2025, 11:47am

Following this comment Performance degradation after upgrading from 8.6.1 to 8.16.1 · Issue #118623 · elastic/elasticsearch · GitHub, would you be able to add -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1 to disable grouping of multiple files into one shared segment?

ALIT · March 28, 2025, 12:49pm

I started one of my nodes with sharedArenaMaxPermits=1.
I will monitor map usage and report back on monday.

Also started to monitor the number of maps every 30 seconds on all nodes.

ALIT · March 31, 2025, 7:13am

@Ignacio_Vera - looks like it helped!

This is my graph of two different nodes in the cluster, after applying the change on the first (green) node.

Funny enough, just some minutes ago, the second node crashed, when it hit the 500k maps

Ignacio_Vera · March 31, 2025, 8:02am

Thank you!

Looking at the graph, it does look like a leak but we will need to dig in.

ALIT · May 2, 2025, 7:14am

Is there any news on this topic?

Topic		Replies	Views
Elasticsearch 8.17.2: Native memory allocation (mmap) failed Elasticsearch runtime-fields	2	349	March 27, 2025
There is insufficient memory for the Java Runtime Environment to continue Elasticsearch runtime-fields	45	360	January 28, 2026
Heap filled up after upgrade to 8.17 Elasticsearch painless , runtime-fields , vector-search	0	261	January 8, 2025
After upgrade to .16, problems Elasticsearch	13	422	July 6, 2017
ES eating all memory despite JVM startup configuration Elasticsearch	8	902	July 5, 2017

OOM since 8.16.1 with openjdk23

Related topics