I set up an ES cluster a couple weeks ago dedicated to a specific search
and document pattern and have been experiencing problems with it since then.
Every 18-24 hours we need to restart our cluster because we run out of
heap. Either there's a memory leak or problems with GC. Here is an image of
the sample memory usage:
We deployed with JDK 1.7.u25 and v0.90.5. Relevant stats:
4 nodes (AWS 2xlarge), 1 replica
16G reserved heap
15 shards per index, 25 indexes, only 11M docs, relatively uniformly
distributed over indexes ( I know the allocation is overkill right now but
we're preparing for a huge influx of data)
200-500 searches/s
mlockall = true
Using the Java API in Scala
wrapper.java.additional.1=-Delasticsearch-service
wrapper.java.additional.2=-Des.path.home=%ES_HOME%
wrapper.java.additional.3=-Xss256k
wrapper.java.additional.4=-XX:+UseParNewGC
wrapper.java.additional.5=-XX:+UseConcMarkSweepGC
wrapper.java.additional.6=-XX:CMSInitiatingOccupancyFraction=75
wrapper.java.additional.7=-XX:+UseCMSInitiatingOccupancyOnly
wrapper.java.additional.8=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.9=-Djava.awt.headless=true
5 shards per index, 500K total docs, ~10-50 searches/s
4 nodes, medium instances, 1 replica
JDK 1.6.u41
I know it's hard to diagnose with just this information, but I was
wondering if anyone has seen something similar and/or if there's something
obvious setting I'm overlooking that I should be checking on. Do I simply
have not enough nodes? Is there any other information I can provide that
would help?
Ah I wasn't clear-- This is what an extended view looks like. It'll GC less
and less effectively each time until it crosses the 75% mark and then races
until it runs out of heap. Then we restart. We ended up implementing
automatic rolling restarts of our cluster once the heap crosses 80% mark.
We have 25 time-based indexes aliased to one name. 95% of our searches are match
all queries across all the indexes using the alias, sometimes with
subtypes set. We use term filters heavily-- many times with 50-500 terms
specified, nested inside boolean filters with some other criteria.
Does this help?
On Thursday, December 5, 2013 2:44:26 PM UTC-5, Jörg Prante wrote:
The graphs show that GC is working. Can you post more info about how the
queries look like and what messages appear when you run out of heap?
Ah I wasn't clear-- This is what an extended view looks like. It'll GC
less and less effectively each time until it crosses the 75% mark and then
races until it runs out of heap. Then we restart. We ended up implementing
automatic rolling restarts of our cluster once the heap crosses 80% mark.
We have 25 time-based indexes aliased to one name. 95% of our searches are match
all queries across all the indexes using the alias, sometimes with
subtypes set. We use term filters heavily-- many times with 50-500 terms
specified, nested inside boolean filters with some other criteria.
Does this help?
On Thursday, December 5, 2013 2:44:26 PM UTC-5, Jörg Prante wrote:
The graphs show that GC is working. Can you post more info about how the
queries look like and what messages appear when you run out of heap?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.