Elasticsearch 8.17.2: Native memory allocation (mmap) failed

Hello,

I just recently upgraded my Elasticsearch cluster from 8.10.4 to 8.17.2. Prior to that I have not experienced memory issues with the same load.

Data nodes run on i4i.2xlarge EC2 instances with 64G RAM and 8 vCPU.

Elasticsearch runs on Ubuntu 20.04 and was installed using Debian package. And have memory locked with bootstrap.memory_lock: true, and jvm heap size set to -Xms8g -Xmx8g.

Since upgrade, Elasticsearch service occasionally crashes on one of data nodes with following log:

# journalctl -u elasticsearch.service

systemd-entrypoint[20439]: #
systemd-entrypoint[20439]: # There is insufficient memory for the Java Runtime Environment to continue.
systemd-entrypoint[20439]: # Native memory allocation (mmap) failed to map 16384 bytes. Error detail: committing reserved memory.
systemd-entrypoint[20439]: # An error report file with more information is saved as:
systemd-entrypoint[20439]: # /var/log/elasticsearch/hs_err_pid20439.log
systemd-entrypoint[20439]: [thread 219314 also had an error]
systemd-entrypoint[20439]: [thread 21332 also had an error]
systemd-entrypoint[20439]: [thread 219313 also had an error]
systemd-entrypoint[20439]: [thread 21335 also had an error]
systemd-entrypoint[20377]: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007fa20c630000, 16384, 0) failed; error='Not enough space' (errno=12)
systemd-entrypoint[20377]: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007fa2c0998000, 16384, 0) failed; error='Not enough space' (errno=12)
systemd-entrypoint[20377]: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f8a65c9f000, 16384, 0) failed; error='Not enough space' (errno=12)
systemd-entrypoint[20377]: ERROR: Elasticsearch exited unexpectedly, with exit code 1
systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

There is no other heavy processes running on the same server. Just a couple of agents for collecting logs/metrics.

Would someone please give me some directions how to debug such thing?

Wild guess but this thread had sonething similar.

thanks, it indeed looks like the same issue, I'll follow up in that thread