I'm trying to optimize some large GCs that I'm seeing when there seems to
be plenty of memory still available. Here's an example log message:
[2013-01-16 18:27:45,623][WARN ][monitor.jvm ]
[dm-essearchp102.bldrprod.local-ElasticSearch]
[gc][ConcurrentMarkSweep][50571][3] duration [13.2s], collections
[1]/[13.6s], total [13.2s]/[13.6s], memory [24.8gb]->[14.4gb]/[27.9gb]
These aren't extremely frequent (a few per day per node) which is good, but
the stop the world pauses can cause some nasty outlier response times.
Things I know that could cause this:
Low memory - The message above indicates I still have 3GB of heap left
when it kicks in and it frees up 10GB, so I'm not low on memory, but that
seems to be a huge chunk to free.
Java heap getting swapped - I have mlockall enabled correctly, so the
Java heap is not getting into swap. (Side note: mlockall was not working
for me for a while and even increasing common.jna logging, no error was
observed in the logs)
OpenJDK - I'm running the one from oracle
My system config is:
0.19.3
24 cores, 48GB of RAM. Initially, 24GB of RAM and bumped it to 28GB
dedicated to ES
Kernel: 2.6.18-194.32.1.el5
Java: java version "1.6.0_21"
~50 million documents w/ ~600GB of total data (x2 when replicas are taken
into account)
4 nodes
Using the default java settings from here:
Things I've done:
Used index.routing.allocation.total_shards_per_node to ensure some of the
biggest index were evenly distributed. This helped (as a side note, it
would be awesome to have this set automatically to force close to equal
distribution or have the shard router automatically do this).
Bumped up heap from 24GB to 28GB. This seemed to help
After this, I am still having some (thankfully fewer) long GCs. Things that
I'm thinking about trying:
UseCompressedOops - This should save me a decent chunk of heap, I think.
Anyone have any positive/negative experiences with this? Since my heap is
less than 30GB should be applicable.
Facet optimizations - Subdividing some data between indexes that have
different query profiles in order to optimize some facet usage. Also,
fixing up the data model on one facet getting used that I don't believe is
efficient
Getting elasticfacets plugin going in order to get visibility into the
field cache (it would be awesome to see these stats get pushed into the
core. Most people will need em at some point)
Going to Java7 and evaluating some of the new GC methods (G1). Anybody
have any experience there?
Run two nodes per server in order to reduce the GC impact. Anybody
experiences with this?
Adding another node to the cluster.
If anyone has any other ideas or feedback on things to try above, it would
be much appreciated.
I agree with recommendations of moving to java 7 and the latest version of
elasticsearch. However, I don't think G1 is the default garbage collector
in java 7 yet. At least it doesn't seem to be the case on linux. I have
also seen several reports that indicate that G1 might not be ready for
prime time yet:
Jorg, your notes on tuning elasticsearch are really top notch. Thank you
for providing that.
I have solved the issue with the following approaches:
Subdivided an index that was just too big. I had a 300+ GB index that was
sharded 6 ways, leaving each shard at around 50GB. With the default max
segment size at 5GB combined with good amount of updates/deletes, this left
a lot of deletes in unmerged segments. I could have re-sharded, but instead
split out to 4 separate indexes and I'm trying to cap my shard size at
~10GB.
Moved to the latest Java7 (u11). Updated my service wrapper settings to
be in sync with the latest one in github. The GC's reported by bigdesk look
to be much more frequent vs a steady growth and massive GC.
I did not move to G1 GC or enable UseCompressedOops. Since things are
stable I should be able to evaluate these options at some point.
Thanks!
Paul
On Friday, January 18, 2013 12:56:20 PM UTC-7, Igor Motov wrote:
I agree with recommendations of moving to java 7 and the latest version of
elasticsearch. However, I don't think G1 is the default garbage collector
in java 7 yet. At least it doesn't seem to be the case on linux. I have
also seen several reports that indicate that G1 might not be ready for
prime time yet:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.