There is insufficient memory for the Java Runtime Environment to continue

Hello. I need help. In our system, there are 8 data nodes. And my Elasticsearch VM RAM is 16GB.
Normally, Elasticsearch data nodes use 60-65% of RAM. I check via the command “top” on Ubuntu. Unfourtunately, once in a month one of these Elasticsearch service on the data nodes stop and error is below:
I check Grafana and Heap size is 6GB, when it is restarting. So 8GB(heap) + 4GB (max direct memory) + 1GB (OS - even 500MB) =13GB. 2GB should be empty.

vm.max_map_count in data node is 262144

Could you please help me if you face with this problem? Thank you.




There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (mmap) failed to map 16384 bytes. Error detail: committing reserved memory.





Possible reasons:



The system is out of physical RAM or swap space

This process has exceeded the maximum number of memory mappings (check below

for /proc/sys/vm/max_map_count and Total number of mappings)

This process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap

Possible solutions:

Reduce memory load on the system

Increase physical memory or swap space

Check if swap backing store is full

Decrease Java heap size (-Xmx/-Xms)

Decrease number of Java threads

Decrease Java thread stack sizes (-Xss)

Set larger code cache with -XX:ReservedCodeCacheSize=

JVM is running with Zero Based Compressed Oops mode in which the Java heap is

placed in the first 32GB address space. The Java Heap base address is the

maximum limit for the native heap growth. Please use -XX:HeapBaseMinAddress

to set the Java Heap base and to place the Java Heap above 32GB virtual address.

This output file may be truncated or incomplete.



Out of Memory Error (os_linux.cpp:2936), pid=2406074, tid=3962668



JRE version: OpenJDK Runtime Environment (24.0+36) (build 24+36-3646)

Java VM: OpenJDK 64-Bit Server VM (24+36-3646, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)

Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -F%F -- %E" (or dumping to /usr/share/elasticsearch/core.2406074)



---------------  S U M M A R Y ------------

Command Line: -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j2.formatMsgNoLookups=true -Djava.locale.providers=CLDR -Dorg.apache.lucene.vectorization.upperJavaFeatureVersion=24 -Des.distribution.type=deb -Des.java.type=bundled JDK --enable-native-access=org.elasticsearch.nativeaccess,org.apache.lucene.core --enable-native-access=ALL-UNNAMED --illegal-native-access=deny -XX:ReplayDataFile=/var/log/elasticsearch/replay_pid%p.log -Des.entitlements.enabled=true -XX:+EnableDynamicAgentLoading -Djdk.attach.allowAttachSelf=true --patch-module=java.base=lib/entitlement-bridge/elasticsearch-entitlement-bridge-9.0.0.jar --add-exports=java.base/org.elasticsearch.entitlement.bridge=org.elasticsearch.entitlement,java.logging,java.net.http,java.naming,jdk.net -Xms8g -Xmx8g -Djava.io.tmpdir=/tmp/elasticsearch-15812986021906771924 -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -XX:MaxDirectMemorySize=4294967296 -XX:InitiatingHeapOccupancyPercent=30 -XX:G1ReservePercent=25 --module-path=/usr/share/elasticsearch/lib --add-modules=jdk.net --add-modules=jdk.management.agent --add-modules=ALL-MODULE-PATH -Djdk.module.main=org.elasticsearch.server org.elasticsearch.server/org.elasticsearch.bootstrap.Elasticsearch

Host: INTEL(R) XEON(R) GOLD 6530, 8 cores, 15G, Ubuntu 22.04.5 LTS
Time: Mon Nov 24 13:30:21 2025 +04 elapsed time: 5879127.478827 seconds (68d 1h 5m 27s)




Welcome to the forum.

Er, why is the “max direct memory” 4GB ?

EDIT: I see you have set this option via -XX:MaxDirectMemorySize=4294967296. Note this memory, and the heap are NOT the total memory that elasticsearch might use. Look at the rss field from ps to get a guide to actual (current) process size in memory, and vsz for the process size in total (noting this number can be massively misleading).

FYI in Linux, the OS tries other use all memory as best it can, so a portion will almost aways be allocated as file system cache (shown as buff/cache in top).

The “once in a month” always happens at the same time, e.g the first Tuesday at 2am, or is it (as far as you can tell) random. During working hours or out of working hours? Does just one data node exit, and if so is it always the same one, or it varies across the 8?

Do you have OS logs around the time of the crash? What do they show?

Please also supply the elasticsearch version.

Elasticsearch also ships with its own internal monitoring, have you looked there? Before the crash, are there any sudden spikes?

Have you tried just increasing the value of /proc/sys/vm/max_map_count ?

1 Like

Thank you for reply. My Elasticsearch version is 9.0.0

  1. -XX :MaxDirectMemorySize is not set by me. I only set -Xms8g and -Xmx8g. I think Elasticsearch automatically set this setting.

  2. /proc/sys/vm/max_map_count value is 262144. Chatgpt said that it is recommendation.

a)
For Elasticsearch JVM
RSS: 10001156 KB ≈ 9.53 GB
VSZ: 209930568 KB ≈ 200 GB (normal for JVM)

b) Elasticseach-cli is 102MB

  1. “Once in a month“ is not static. Sometimes twice per a week. And not the same node, this problem happened in the different data nodes on different times (at night, in the morning). All data nodes have the same RAM, CPU and JVM options.

  2. I checked journalctl
    journalctl -u elasticsearch.service
    Sep 17 12:24:37 prod-elastic-data01 systemd[1]: Stopping Elasticsearch...
    Sep 17 12:24:51 prod-elastic-data01 systemd[1]: elasticsearch.service: Deactivated successfully.
    Sep 17 12:24:51 prod-elastic-data01 systemd[1]: Stopped Elasticsearch.
    Sep 17 12:24:51 prod-elastic-data01 systemd[1]: elasticsearch.service: Consumed 1month 1d 1h 26min 49.671s CPU time.
    Sep 17 12:24:52 prod-elastic-data01 systemd[1]: Starting Elasticsearch...
    Sep 17 12:25:22 prod-elastic-data01 systemd[1]: Started Elasticsearch.
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406074]: #
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406074]: # There is insufficient memory for the Java Runtime Environment to continue.
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406074]: # Native memory allocation (mmap) failed to map 16384 bytes. Error detail: committing reserved memory.
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406074]: [thread 3962669 also had an error]
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406074]: # An error report file with more information is saved as:
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406074]: # /var/log/elasticsearch/hs_err_pid2406074.log
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406001]: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f4333466000, 16384, 0) failed; error='Not enough space' (errno=12)
    Nov 24 13:30:21 prod-elastic-data01 systemd-entrypoint[2406001]: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f4362afa000, 16384, 0) failed; error='Not enough space' (errno=12)
    Nov 24 13:30:24 prod-elastic-data01 systemd-entrypoint[2406001]: ERROR: Elasticsearch exited unexpectedly, with exit code 1
    Nov 24 13:30:24 prod-elastic-data01 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
    Nov 24 13:30:24 prod-elastic-data01 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
    Nov 24 13:30:24 prod-elastic-data01 systemd[1]: elasticsearch.service: Consumed 2month 2d 52min 34.705s CPU time.

  1. I could not see anything for today when I run “dmesg --ctime“. Because today Elasticsearch stopped too.

Thanks for all the answers.

This is your issue.

I tend to agree with chatGPT that you should try increasing that value as first try.

You can monitor the current usage via:

for pid in $(pidof java); do
  sudo grep -qa MaxDirectMemorySize /proc/$pid/cmdline && \
  sudo wc -l /proc/$pid/maps
done

If it is monotonically and steadily growing at a decent rate, thats going to mean you hit it sometime.

I vaguely remember a thread where there was (IIRC) a bug, and it meant this limit was more easily hit. It led to this bug, which I’m not saying is your issue but just might be related. There is a workaround available, and if I’ve read the bug correctly that workaround has been implemented in 9.0.8+, and indeed in 9.2.1 I do see the -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1 JVM arg present.

Is the issue post some upgrade to 9.0.0? If so, what were you on before? If not, has it been there since the system was commissioned?

EDIT: This is also useful information.

Thank you for reply)

I run

for pid in $(pidof java); do
sudo grep -qa MaxDirectMemorySize /proc/$pid/cmdline &&
sudo wc -l /proc/$pid/maps
done

The result increased from 85009 to 85125, then decreased to 85059. Even I checked Prometheus, because we set node exporter on the data node. I saw that RAM limit of the node was reached, when Elasticsearch stopped. It is real native memory exceeding error.
Yes we upgraded Elasticsearch from 8.13.4 to 9.0.0.

Yeah, that looks sort of normal, you can expect, over short time intervals, some oscillation. But:

I’m not sure I am understanding the wording, “real native memory”? Rather than what?

And was the data node crashing problem present only after the upgrade to 9.0.0? If so, thats a little bit suggestive you could be hitting the specific bug I noted, though can just be a coincidence too. And that bug might be unrelated, just a distraction.

You seem to have at least 3 options open to you:

  1. try increasing value of /proc/sys/vm/max_map_count ?
  2. upgrade to 9.0.8+
  3. use the workaround discussed in the git case

and monitor system to see if those changes help.

If you want to do nothing for now, then put in place specific monitoring so that when data node X does crash next time you will learn more. e.g. track value of /proc/pid/maps, track overall process size in RAM (rss) on all processes, and so on. You may wish to consider adding -XX:NativeMemoryTracking=detail to the JVM args, and use jcmd to track native memory usage periodically.

EDIT. Add these to JVM options file

-XX:+UnlockDiagnosticVMOptions
-XX:NativeMemoryTracking=detail
-XX:+PrintNMTStatistics

I also welcome others to make suggestions here, there are a lot of bright people on the forum!!

1 Like

I am generally a bit careful with the very first version of a major release and tend to wait to upgrade any production system until the first minor release has been introduiced. I would therefore recommend you upgrade to the very latest version available and see if that resolves the issue.

When Elasticsearch crashed, the data node had reached 100% of its 16GB physical RAM, which I confirmed via Prometheus/node-exporter.

Thank you so much. Yes I will add these options to jvm options. And for now, I will decrease heap size to -Xms6g,-Xmx6g and add these options. When it crashes again, I can detect the root cause. I don’t want to change max nmap count for now. Because I am not sure, it is the cause of this issue.

Thank you. Yes I suggested to upgrade Elastic to the upper version from 9.0.0. Because I can sometimes see some anormallity in APM too. But team lead told me to investigate the problem, not to upgrade)

Given that @RainTown pointed to a bug in the version you are using I would think your time would be better spent upgrading than debugging an issue for which the solution will be to upgrade anyway. Might be worth showing this thread to your team lead. :slight_smile:

Well, the error message told you explicitly “Native memory allocation (mmap) failed to map 16384 bytes”.

Your operating system has RAM. It uses it some of it itself, and others applications use some of it. The rest is “up for grabs” by applications.

elasticsearch is a java application. The JVM uses RAM is lots of ways, but 3 primary ones are Heap, Direct Memory, and “Native memory”. Your JVM args limit the size of the first 2. The last, “native memory”, is effectively limited by the remaining available RAM. Generally, the biggest user of “Native Memory” in elasticsearch would be mmap-ed files.

Note the setting vm.max_map_count limits the number of those “chunks” of memory that can be mmap-ed. But the bug means Elasticsearch uses more than it really could/should.

Well, you have investigated. if you want to rule out the bug I pointed to, just use the workaround. Or upgrade. I suggest the latter.

This is up to you. But reducing the heap will likely just mean it takes a little longer til you hit the same issue. Reducing the heap size requires a JVM restart, which is exactly all you need to do to apply the workaround I pointed to. Actually one JVM restart per node is also all you need for a version upgrade too!

Right now I dont know if it was the number of “chunks” which was hit, or the total size of the chunks, or indeed something entirely different. But either way, elasticsearch now ships with a fix for that specific issue and you really should take advantage of that. IMHO.

1 Like

Dear Kevin, thank you. I think decreasing heap size will slow issue to happen. The best way is upgrading 9.0.8+. Thank you!

Yeah right) I think upgrading is the best solution. Thank you!

1 Like

If you are upgrading, please upgrade to the latest version and not some earlier version, e.g. 9.0.x.

I will upgrade to 9.1.x, if I will. Or I will add Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1

I had a new thought about this. That bug is around the maximum number of these mmap-ed regions. effectively the bug-impacted releases uses them up quicker, and eventually runs out. i.e. its not the amount of memory that runs out, its that the number of mmap files is limited.

But your error is more explicit, and the no available memory seems corroborated by other data.

In both threads here that are linked to that github issue the error was reported as

# Native memory allocation (malloc) failed to allocate 1048576 bytes. 

The bug means it was not actually the 1048576 bytes that is not available, its a slot that is not available!!

This is not the same error you saw, which was very specific to mmap, not malloc:

In this case it might be that it really is the 1638 bytes rather than the slot.

So, though I still think it very worthwhile to upgrade and then check, and I see no reason not to upgrade to 9.2.1 rather than 9.1.x, you should also not be surprised if all we did here is rule out a cause rather than find the “smoking gun”. That will still be progress!

Always try to change one thing at a time.

1 Like

Yes, probably. But with some possible cost, there is less heap and less direct memory, and therefore (depending on load) more GC, maybe slower queries/indexing, etc. None of those things might be measurable, if they even exist. And it might also slow the issue to point that that it eliminates it.

Problem solving in IT is just hard sometimes !

Yes the pyhsical RAM was exhausted, when Elasticsearch crashes. I have 8 data nodes. I will add these options you sent me before:

-XX:+UnlockDiagnosticVMOptions
-XX:NativeMemoryTracking=detail
-XX:+PrintNMTStatistics

But I am not sure I have to set Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1 or not)) My brain got mixed :smiley:

If they approve, I will upgrade.

If you are asking for advice, I’d do those changes anyways. Those settings are just for diagnostic purposes.

The “change one thing at a time” point applies, so I would EITHER add “-Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=1” OR I’d change heap size. Not both.

And after making those changes I’d be tracking the /proc/pid/maps files, and tracking the native memory usage with jcmd if I made the NativeMemoryTracking change, say every minute on every node.