we regularly (several times a week) get longish GCs (20s or more) due to
promotion failures.
From what I understand this type of major GC is caused by fragmentation of
the heap.
So I'm wondering:
What is all the stuff ES puts into the heap that ends up in the Old Gen?
Are there any recommended strategies for dealing with this specific kind
of problem.
For example, would allowing more filter caching help or cause even more
problems? And so on.
To give a little more info on our usage, we're read heavy, nearly entirely
filter operations. Our heap is at ~10g. nearly all of which is used by the
Old Gen (until a major GC runs).
Maybe it's the field cache that moves to old gen, when using facets.
I am tackling this challenge by a combination of several strategies
tuning index.indices.fielddata.cache.size
working around the issue by increasing node transport and ping timeout
from 5s to something high like 30s (so GCs are allowed to run 20s without
node disconnects)
reducing number of shards per node (this just means to reduce the number
of docs / index size / filter cache per node somehow), simplest method is
adding nodes
using heap sizes as small as possible - in my use case 6G are sufficient
not sure if you want to go the path on the bleeding edge, but using Java
8 and G1GC with XX:MaxGCPauseMillis of ~100-1000ms helps me. CPU load is a
bit higher with G1GC, but since I have 32 cores on a node, it does not
matter that much.
otherwise, there are lots of CMS GC tuning options (needs deep GC
analysis)
we regularly (several times a week) get longish GCs (20s or more) due to
promotion failures.
From what I understand this type of major GC is caused by fragmentation of
the heap.
So I'm wondering:
What is all the stuff ES puts into the heap that ends up in the Old Gen?
Are there any recommended strategies for dealing with this specific
kind of problem.
For example, would allowing more filter caching help or cause even more
problems? And so on.
To give a little more info on our usage, we're read heavy, nearly entirely
filter operations. Our heap is at ~10g. nearly all of which is used by the
Old Gen (until a major GC runs).
We don't really run facets and our field data cache size is very small.
Increasing the node transport and ping timeouts is definitely something
we'll consider. Reducing the number of shards per node is also something to
consider, but am reluctant to add more nodes at the moment (already
spending lots of cash).
I think a deep dive into GC tuning is possibly called for, and we've done
some of that already.
Java 8 with G1GC is an interesting suggestion too!
Thanks again,
Nic
On Monday, 17 February 2014 20:50:30 UTC, Jörg Prante wrote:
Maybe it's the field cache that moves to old gen, when using facets.
I am tackling this challenge by a combination of several strategies
tuning index.indices.fielddata.cache.size
working around the issue by increasing node transport and ping timeout
from 5s to something high like 30s (so GCs are allowed to run 20s without
node disconnects)
reducing number of shards per node (this just means to reduce the number
of docs / index size / filter cache per node somehow), simplest method is
adding nodes
using heap sizes as small as possible - in my use case 6G are sufficient
not sure if you want to go the path on the bleeding edge, but using Java
8 and G1GC with XX:MaxGCPauseMillis of ~100-1000ms helps me. CPU load is a
bit higher with G1GC, but since I have 32 cores on a node, it does not
matter that much.
otherwise, there are lots of CMS GC tuning options (needs deep GC
analysis)
we regularly (several times a week) get longish GCs (20s or more) due to
promotion failures.
From what I understand this type of major GC is caused by fragmentation
of the heap.
So I'm wondering:
What is all the stuff ES puts into the heap that ends up in the Old
Gen?
Are there any recommended strategies for dealing with this specific
kind of problem.
For example, would allowing more filter caching help or cause even more
problems? And so on.
To give a little more info on our usage, we're read heavy, nearly
entirely filter operations. Our heap is at ~10g. nearly all of which is
used by the Old Gen (until a major GC runs).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.