ES Heap to 100% and cluster halt

Jose_E_Pettoruti · March 23, 2015, 10:28am

Hi guys
Hope one of you can help...
In our prod environment, we have a 5 data nodes cluster (data:true,
master:false) + 3 masters (master:true, data:false). Elasticsearch 1.4.4,
Oracle Java 1.8. 40.
Data nodes have 30GB memory, masters 15GB.
We have a problem where the Heap crosses the heap limit in some nodes, and
the whole cluster comes to a stop. This happens in maybe one or two nodes,
while the other ones are still ok.
No out of memory errors are displayed, but on the nodes that are still
alive, you can see some errors like "No search context for id [xxxxx]". I
need to restart the whole cluster for it to become responsive again.
In the heap usage, i see that it behaves properly for a while, doing a nice
saw pattern, but after a while (~1 day), some node starts going up and up
without dropping anytime, then crossing the limit.

You can see some of this in this graph of one of our crashes:

https://lh4.googleusercontent.com/-_08MBFfBKbM/VQ_oaPJAh3I/AAAAAAACFdI/fOt75itKbDE/s1600/Screen%2BShot%2B2015-03-23%2Bat%2B10.12.28.png
Also, i can notice that the CPU usage gets to a peak when that raise starts.

In elasticsearch.yml I don't have many important settings other than
bootstrap.mlockall: true.

In the enviroment variables file I have:

ES_HEAP_SIZE=15342m
MAX_OPEN_FILES=65535
MAX_LOCKED_MEMORY=unlimited
MAX_MAP_COUNT=262144

Memory usage on the nodes seem to be fine, having around 6GB free all the
time (even during the crashes).

Field data seems to be around 300MB all the time, while filter cache is
1.5GB (10% of the Heap, as default).

https://lh3.googleusercontent.com/-rKO9_C32pvY/VQ_qUKfXwAI/AAAAAAACFdU/KLkdKHFftKU/s1600/Screen%2BShot%2B2015-03-23%2Bat%2B10.25.25.png
(In that graph you can see the filter size in 2 nodes going up at the end,
that's when I increased it to 25% in 2 nodes, but same effect, cluster
crashes the same way).

I wonder if this is something related to
https://github.com/elastic/elasticsearch/issues/8249, but seems to be fixed
by 1.4.4.

Any help will be greatly appreciated.

Kind regards
Jose

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c285ebfb-a6b8-40a3-b96f-e091bf8bdc4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES2.0.2 - Heap near 100% and eventually Elasticsearch locks up Elasticsearch	8	1156	January 4, 2018
100% heap usage during indexing Elasticsearch	1	358	July 6, 2017
Why does heap usage keep approaching 100%? Elasticsearch	5	1499	July 6, 2017
High heap usage Elasticsearch	6	981	March 8, 2019
Elasticsearch data node JVM Running out of memory Elasticsearch	2	475	May 8, 2020

ES Heap to 100% and cluster halt

Related topics