Elasticsearch uses more memory than JVM heap settings, reaches container memory limit and crash

vroad · February 11, 2020, 11:26pm

Elasticsearch uses more memory than JVM heap settings, which is currently -Xms512m, -Xmx512m. I tried setting those values to 1g and reverted because the container crashed immediately after relaunching containers, because of OOM.

I run Elasticsearch 7.3.0 on ECK, and memory usage is reported by Prometheus node exporter.
Memory limlt is set to 1.5GiB to allocate some memory for EC2 instances, which are 3x t3.small instance with 2GB of RAM for each.

Is it bad idea to set memory limit for Elasticsearch containers? I'm not sure whether it includes virtual memory or not. If it does, that might cause the container to crash even when enough amount of memory is available.

DavidTurner · February 12, 2020, 3:14am

Quoting the docs on setting the heap size (emphasis mine):

Set Xmx and Xms to no more than 50% of your physical RAM. Elasticsearch requires memory for purposes other than the JVM heap and it is important to leave space for this. For instance, Elasticsearch uses off-heap buffers for efficient network communication, relies on the operating system’s filesystem cache for efficient access to files, and the JVM itself requires some memory too. It is normal to observe the Elasticsearch process using more memory than the limit configured with the Xmx setting.

In a container, "physical RAM" means the memory limit of the container. If you have a 1.5GiB memory limit on your container then you must set the Elasticsearch heap size to no more than 0.75GiB.

vroad · February 12, 2020, 3:59am

Then what is my problem??
JVM heap is set to 512MB currently, which is less than 50% of container memory limit.
In my case memory usage sometimes exceeds 1.5GB and crash.
Container memory limit is 3 times large as heap size.

vroad · February 12, 2020, 6:03am

This guy on stackoverflow reports similar error as me, and unsetting memory limit fixes the problem.

He also says that linux kernel 4.15 fixes the memory issue.

vroad · February 12, 2020, 6:30am

The issue is still not fixed...?

DavidTurner · February 12, 2020, 8:28am

Sorry, this wasn't clear. You said:

A 1GiB heap is definitely too large for a 1.5GiB container.

Yes there are known bugs in some kernels that inappropriately trigger the OOM killer in a container. That still doesn't mean it's a bad idea to set the memory limit on an Elasticsearch container, it just means it's a bad idea to use a buggy kernel.

If you think it's not that, please share the full dmesg output from such a crash; it could be thousands of lines long, so use https://gist.github.com/ if it doesn't fit here.

jcastelc · February 13, 2020, 8:07am

Maybe is related to JVM "metaspace" usage, not heap. Check java MetaspaceSize and MaxMetaspaceSize settings (Xmx and Xms too, of course)

Try several settings for heap/metaspace, and monitor JVM heap/meta usage with "jstat" command before setting container memory limits.

https://docs.oracle.com/javase/8/docs/technotes/tools/windows/jstat.html

DavidTurner · February 13, 2020, 8:25am

@jcastelc if you are seeing evidence of ongoing metaspace allocation in your cluster then I'd like to see more detail. It is rare to see metaspace memory pressure with Elasticsearch. I think its metaspace usage should be pretty much constant since I'm not aware of any dynamic loading happening after startup, and we account for this in the 2x limit described in the documentation.

jcastelc · February 13, 2020, 9:15am

Good to know that. No evidence, it only was a suggestion. Thanks for the details!

vroad · February 17, 2020, 8:24am

My ES cluster crashed again, but memory usage of the ES container stayed within the 1.5GiB limit.
I've found that EBS burst credit for root volume was running out before the crash. This might be a problem other than memory usage...

This time I was able to get logs from failed node. It logged lots of warnings by JvmGcMonitorService. But this might be caused by loss of EBS burst credit for root volume.

vroad · February 17, 2020, 8:35am

EBS burst credit decreased slowly before the crush. Once it reaches to 0, the pod gets evicted because of slow I/O caused by that.
I should solve this problem first...
Thank you for the answers anyway.

system · March 16, 2020, 8:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting JVM Heap size for Elasticsearch container Elasticsearch docker	2	5452	April 21, 2022
ES_JAVA_OPTS exceding memory usage Elasticsearch docker	3	1198	February 26, 2020
Understanding docker and memory settings Elasticsearch	3	2958	April 8, 2018
Memory usage in elasticsearch Elasticsearch	4	50	March 25, 2025
Understanding Jvm memory calculation with docker Elasticsearch docker	3	760	July 21, 2021

Elasticsearch uses more memory than JVM heap settings, reaches container memory limit and crash

Related topics