Promotion failures (GC issues)

nicolas_long · February 17, 2014, 3:34pm

Hey all,

we regularly (several times a week) get longish GCs (20s or more) due to
promotion failures.

From what I understand this type of major GC is caused by fragmentation of
the heap.

So I'm wondering:

What is all the stuff ES puts into the heap that ends up in the Old Gen?
Are there any recommended strategies for dealing with this specific kind
of problem.

For example, would allowing more filter caching help or cause even more
problems? And so on.

To give a little more info on our usage, we're read heavy, nearly entirely
filter operations. Our heap is at ~10g. nearly all of which is used by the
Old Gen (until a major GC runs).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b3ef926-94b7-4de0-b076-d5fdbc44021c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 17, 2014, 8:50pm

Maybe it's the field cache that moves to old gen, when using facets.

I am tackling this challenge by a combination of several strategies

tuning index.indices.fielddata.cache.size
working around the issue by increasing node transport and ping timeout
from 5s to something high like 30s (so GCs are allowed to run 20s without
node disconnects)
reducing number of shards per node (this just means to reduce the number
of docs / index size / filter cache per node somehow), simplest method is
adding nodes
using heap sizes as small as possible - in my use case 6G are sufficient
not sure if you want to go the path on the bleeding edge, but using Java
8 and G1GC with XX:MaxGCPauseMillis of ~100-1000ms helps me. CPU load is a
bit higher with G1GC, but since I have 32 cores on a node, it does not
matter that much.
otherwise, there are lots of CMS GC tuning options (needs deep GC
analysis)

Jörg

On Mon, Feb 17, 2014 at 4:34 PM, Nic Long nicolas.long@guardian.co.ukwrote:

Hey all,

we regularly (several times a week) get longish GCs (20s or more) due to
promotion failures.

From what I understand this type of major GC is caused by fragmentation of
the heap.

So I'm wondering:

What is all the stuff ES puts into the heap that ends up in the Old Gen?

Are there any recommended strategies for dealing with this specific
kind of problem.

For example, would allowing more filter caching help or cause even more
problems? And so on.

To give a little more info on our usage, we're read heavy, nearly entirely
filter operations. Our heap is at ~10g. nearly all of which is used by the
Old Gen (until a major GC runs).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4b3ef926-94b7-4de0-b076-d5fdbc44021c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG7nX7YRy7dEnfDToWaPXvVTjfwP%3DXYdPzjRk91YJ0d%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

nicolas_long · February 18, 2014, 3:02pm

Hey Jörg,

thanks for the detailed reply.

We don't really run facets and our field data cache size is very small.
Increasing the node transport and ping timeouts is definitely something
we'll consider. Reducing the number of shards per node is also something to
consider, but am reluctant to add more nodes at the moment (already
spending lots of cash).

I think a deep dive into GC tuning is possibly called for, and we've done
some of that already.

Java 8 with G1GC is an interesting suggestion too!

Thanks again,

Nic

On Monday, 17 February 2014 20:50:30 UTC, Jörg Prante wrote:

Maybe it's the field cache that moves to old gen, when using facets.

I am tackling this challenge by a combination of several strategies

tuning index.indices.fielddata.cache.size

working around the issue by increasing node transport and ping timeout
from 5s to something high like 30s (so GCs are allowed to run 20s without
node disconnects)

reducing number of shards per node (this just means to reduce the number
of docs / index size / filter cache per node somehow), simplest method is
adding nodes

using heap sizes as small as possible - in my use case 6G are sufficient

not sure if you want to go the path on the bleeding edge, but using Java
8 and G1GC with XX:MaxGCPauseMillis of ~100-1000ms helps me. CPU load is a
bit higher with G1GC, but since I have 32 cores on a node, it does not
matter that much.

otherwise, there are lots of CMS GC tuning options (needs deep GC
analysis)

Jörg

On Mon, Feb 17, 2014 at 4:34 PM, Nic Long <nicola...@guardian.co.uk<javascript:>

wrote:

Hey all,

we regularly (several times a week) get longish GCs (20s or more) due to
promotion failures.

From what I understand this type of major GC is caused by fragmentation
of the heap.

So I'm wondering:

What is all the stuff ES puts into the heap that ends up in the Old
Gen?

Are there any recommended strategies for dealing with this specific
kind of problem.

For example, would allowing more filter caching help or cause even more
problems? And so on.

To give a little more info on our usage, we're read heavy, nearly
entirely filter operations. Our heap is at ~10g. nearly all of which is
used by the Old Gen (until a major GC runs).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4b3ef926-94b7-4de0-b076-d5fdbc44021c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8803723-0cec-40ba-a095-4fe73f123e75%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Preventing stop-of-the-world garbage collection Elasticsearch	7	2939	July 6, 2017
Very long GC Elasticsearch	11	6894	July 6, 2017
GC failing to reduce heap memory usage Elasticsearch	10	802	July 6, 2017
Problems with excesive GC Elasticsearch	5	303	July 6, 2017
Suspect GC sync'ed between nodes cause simultaneous performance hit Elasticsearch	14	541	July 6, 2017

Promotion failures (GC issues)

Related topics