OOM since 8.16.1 with openjdk23

Yep, as mentioned here, I can share it.
@Ignacio_Vera: is it safe to post the file as gist? The file has 73MB and as far as I can see, the only "private data" is the hostname. Anything I should consider? As its our company ES, I would not like to share any confidential data.

It should be safe. We only need the initial few hundred lines of the file. If you feel more comfortable you can share it in a private message.

We just uncover a bug on the enrich cache introduced in 8.16 where the cache will slowly grow until it exhaust the JVM heap. If you are using enrich and facing OOM, the way to avoid it is to set the setting enrich.cache_size back to the previous value of 1000.

1 Like

Great that a bug was found,

A big THANKS to @ALIT and @Evesy for helping uncover it, with not a little troubleshooting effort too.

1 Like

Thanks @Ignacio_Vera

I will look at reverting this value back to 1000 (though I'm not clear on where these settings are actually set (Set up an enrich processor | Elasticsearch Guide [8.17] | Elastic) and then dropping the mmap limit back down too.

Is it expected that this would cause issues in our cluster given that we do not make use of any enrichment processors?

No, if you are not using the enrich processor, you should not b affected. I suspect this is something else

1 Like

@Ignacio_Vera I have the crash log from one of our nodes after lowering the limit back to ~200k, this is on a cluster not using ingest.

It appears too large for either Gists and this forum doesn't support uploads of that file type, is there a way I can get it over to you/the Elastic team? Happy to trim it down if you can advise exactly which bits of it are important

Thanks in advance!

Hello, it looks like I'm facing the same issue that I described in Elasticsearch 8.17.2: Native memory allocation (mmap) failed

I checked wc -l /proc/x/maps on old and upgraded
clusters which are loaded more or less the same and the difference is quite big. Old reports ~20k while new reports ~160k

Were there any recent developments on this topic?

2 Likes

Following this comment Performance degradation after upgrading from 8.6.1 to 8.16.1 · Issue #118623 · elastic/elasticsearch · GitHub, would you be able to add -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1 to disable grouping of multiple files into one shared segment?

I started one of my nodes with sharedArenaMaxPermits=1.
I will monitor map usage and report back on monday.

Also started to monitor the number of maps every 30 seconds on all nodes.

1 Like

@Ignacio_Vera - looks like it helped!

This is my graph of two different nodes in the cluster, after applying the change on the first (green) node.

Funny enough, just some minutes ago, the second node crashed, when it hit the 500k maps :wink:

Thank you!

Looking at the graph, it does look like a leak but we will need to dig in.

1 Like

Is there any news on this topic?