Elasticsearch Docker container keeps crashing with exit status of 137

Hi Everyone,

I have a docker-xpack docker image that I deploy on marathon/mesos. The image works great and is uber-stable when xplack security is disabled. When I enable xpack security and use encrypted comms, the containers crash randomly every 2-3 days with no message other than the docker is exiting with status 137. Status 137 usually means mesos killed the container because it's RAM exceeded the configured max.

I have upped both the JVM memory heap and the docker RAM via Mesos configuration from 2GB/4GB to 4GB/8GB and I am still getting exit with 137. In addition, I instrumented one of the containers by calling docker stats every 5 minutes and the overall RAM keeps creeping up and eventually reaches the Mesos limit for the container and Mesos kills it off.

Anyone else have trouble with an elasticsearch-xpack docker? Any ideas/suggestions are welcome.

Thanks

--John

What Elasticsearch version are you on?

Argh, sorry, I should have specified that--ES 5.6.4

More information. This is an Elasticsearch container with default jvm.options (and therefore 2GB memory heap) and 4 GB dedicated to the container.

The memory heap is staying steady in the 1GB range while the container is up to 3.97 GB. So, there appears to be off-heap memory that is getting total memory usage > 4 GB at which point Mesos kills the container.

--John

Interesting update. I have a cluster with xpack security enabled, same docker image but with 31 GB memory heap and 62 GB docker memory and this one has been stable for several weeks.

Question: is anyone aware of a minimum memory heap size for elasticsearch with xpack enabled?

--John

More interesting information.

I set MALLOC_ARENA_MAX=4 and that slowed down the growth of off-heap memory, but did not stabilize it.

A colleague of mine who knows Elasticsearch way more than I do updated the refresh_rate from 10s to 300s.

The combo of MALLOC_ARENA_MAX=4 and refresh_rate=300s appears to have stabilized things for now.

Will monitor and report an update in a few hours.

On one of the clusters the memory usage for a 2GB/4GB combo is 2.38 GB, so the combination of MALLOC_ARENA_MAX=4 and refresh_rate=300s appears to have stabilized things for now.

On the other cluster (with more data), the memory consumed by the ES Docker continues to increase, albeit more slowly.

Again, I can confirm that xpack is not causing this as I've seen this in both configs where xpack is enabled and also when it is disabled.