Debugging ConcurrentMarkSweep when not low on memory

ppearcy · January 16, 2013, 8:53pm

I'm trying to optimize some large GCs that I'm seeing when there seems to
be plenty of memory still available. Here's an example log message:
[2013-01-16 18:27:45,623][WARN ][monitor.jvm ]
[dm-essearchp102.bldrprod.local-ElasticSearch]
[gc][ConcurrentMarkSweep][50571][3] duration [13.2s], collections
[1]/[13.6s], total [13.2s]/[13.6s], memory [24.8gb]->[14.4gb]/[27.9gb]

These aren't extremely frequent (a few per day per node) which is good, but
the stop the world pauses can cause some nasty outlier response times.

Things I know that could cause this:

Low memory - The message above indicates I still have 3GB of heap left
when it kicks in and it frees up 10GB, so I'm not low on memory, but that
seems to be a huge chunk to free.
Java heap getting swapped - I have mlockall enabled correctly, so the
Java heap is not getting into swap. (Side note: mlockall was not working
for me for a while and even increasing common.jna logging, no error was
observed in the logs)
OpenJDK - I'm running the one from oracle

My system config is:
0.19.3
24 cores, 48GB of RAM. Initially, 24GB of RAM and bumped it to 28GB
dedicated to ES
Kernel: 2.6.18-194.32.1.el5
Java: java version "1.6.0_21"
~50 million documents w/ ~600GB of total data (x2 when replicas are taken
into account)
4 nodes
Using the default java settings from here:

github.com

elastic/elasticsearch-servicewrapper/blob/master/service/elasticsearch.conf

set.default.ES_HOME=<Path to Elasticsearch Home>
set.default.ES_HEAP_SIZE=1024

#********************************************************************
# Wrapper Timeout Properties
#********************************************************************
# How long to wait for the JVM to start (in seconds)
wrapper.startup.timeout=300
# How long to wait for the JVM to stop (in seconds)
wrapper.shutdown.timeout=300
# When a ping will timeout to consider the JVM hung (in seconds)
wrapper.ping.timeout=300

#********************************************************************
# Wrapper Java Properties
#********************************************************************
# Java Application
wrapper.java.command=java

# Tell the Wrapper to log the full generated Java command line.

This file has been truncated. show original

Things I've done:

Used index.routing.allocation.total_shards_per_node to ensure some of the
biggest index were evenly distributed. This helped (as a side note, it
would be awesome to have this set automatically to force close to equal
distribution or have the shard router automatically do this).
Bumped up heap from 24GB to 28GB. This seemed to help

After this, I am still having some (thankfully fewer) long GCs. Things that
I'm thinking about trying:

UseCompressedOops - This should save me a decent chunk of heap, I think.
Anyone have any positive/negative experiences with this? Since my heap is
less than 30GB should be applicable.
Facet optimizations - Subdividing some data between indexes that have
different query profiles in order to optimize some facet usage. Also,
fixing up the data model on one facet getting used that I don't believe is
efficient
Getting elasticfacets plugin going in order to get visibility into the
field cache (it would be awesome to see these stats get pushed into the
core. Most people will need em at some point)
Going to Java7 and evaluating some of the new GC methods (G1). Anybody
have any experience there?
Run two nodes per server in order to reduce the GC impact. Anybody
experiences with this?
Adding another node to the cluster.

If anyone has any other ideas or feedback on things to try above, it would
be much appreciated.

Thanks!
Paul

--

jprante · January 17, 2013, 5:27pm

Just some quick notes:

adding nodes is the simplest method to scale ES
1.6.0_21 has serious flaws in heap management, update is recommended to
latest Java 6
but, with a heap of >8 GB you enter a heap size dimension Java 6 was not
designed for
Oracle will terminate Java 6 support in Feb 2013 (that is in a few days)
I recommend the latest Java 7
G1 GC is slower and requires more CPU, but there are no longer GC stalls
like in CMS GC
G1 is default in Java 7u4+ and works like a charm here on AMD Interlagos
Opteron CPU
there is no advantage with two JVMs per node, only even more overhead
(you will double GC overhead of course)

See also my
notes http://jprante.github.com/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html

And, of course, 0.19.3 should be updated to the latest ES version.

Best regards,

Jörg

--

Igor_Motov · January 18, 2013, 7:56pm

I agree with recommendations of moving to java 7 and the latest version of
elasticsearch. However, I don't think G1 is the default garbage collector
in java 7 yet. At least it doesn't seem to be the case on linux. I have
also seen several reports that indicate that G1 might not be ready for
prime time yet:

https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/Hg7uRIeMsm0

On Thursday, January 17, 2013 12:27:41 PM UTC-5, Jörg Prante wrote:

Just some quick notes:

adding nodes is the simplest method to scale ES

1.6.0_21 has serious flaws in heap management, update is recommended to
latest Java 6

but, with a heap of >8 GB you enter a heap size dimension Java 6 was not
designed for

Oracle will terminate Java 6 support in Feb 2013 (that is in a few days)

I recommend the latest Java 7

G1 GC is slower and requires more CPU, but there are no longer GC stalls
like in CMS GC

G1 is default in Java 7u4+ and works like a charm here on AMD Interlagos
Opteron CPU

there is no advantage with two JVMs per node, only even more overhead
(you will double GC overhead of course)

See also my notes
http://jprante.github.com/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html

And, of course, 0.19.3 should be updated to the latest ES version.

Best regards,

Jörg

--

ppearcy · January 28, 2013, 8:00pm

Thanks Guys! Your feedback was invaluable.

Jorg, your notes on tuning elasticsearch are really top notch. Thank you
for providing that.

I have solved the issue with the following approaches:

Subdivided an index that was just too big. I had a 300+ GB index that was
sharded 6 ways, leaving each shard at around 50GB. With the default max
segment size at 5GB combined with good amount of updates/deletes, this left
a lot of deletes in unmerged segments. I could have re-sharded, but instead
split out to 4 separate indexes and I'm trying to cap my shard size at
~10GB.
Moved to the latest Java7 (u11). Updated my service wrapper settings to
be in sync with the latest one in github. The GC's reported by bigdesk look
to be much more frequent vs a steady growth and massive GC.

I did not move to G1 GC or enable UseCompressedOops. Since things are
stable I should be able to evaluate these options at some point.

Thanks!
Paul

On Friday, January 18, 2013 12:56:20 PM UTC-7, Igor Motov wrote:

I agree with recommendations of moving to java 7 and the latest version of
elasticsearch. However, I don't think G1 is the default garbage collector
in java 7 yet. At least it doesn't seem to be the case on linux. I have
also seen several reports that indicate that G1 might not be ready for
prime time yet:

Redirecting to Google Groups

java - JVM Crash with G1 GC and trove library - Stack Overflow

On Thursday, January 17, 2013 12:27:41 PM UTC-5, Jörg Prante wrote:

Just some quick notes:

adding nodes is the simplest method to scale ES

1.6.0_21 has serious flaws in heap management, update is recommended to
latest Java 6

but, with a heap of >8 GB you enter a heap size dimension Java 6 was
not designed for

Oracle will terminate Java 6 support in Feb 2013 (that is in a few days)

I recommend the latest Java 7

G1 GC is slower and requires more CPU, but there are no longer GC
stalls like in CMS GC

G1 is default in Java 7u4+ and works like a charm here on AMD
Interlagos Opteron CPU

there is no advantage with two JVMs per node, only even more overhead
(you will double GC overhead of course)

See also my notes
http://jprante.github.com/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html

And, of course, 0.19.3 should be updated to the latest ES version.

Best regards,

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group, send email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Fwd: FW: Java heap memory is low but process memory isstill high Elasticsearch	1	418	July 6, 2017
Elasticsearch 7.1x + Java 11: Possible GC misconfiguration Elasticsearch	2	562	September 30, 2019
ES 7.4 GC keeps reclaiming less memory on each pass Elasticsearch	9	530	May 24, 2020
Heap usage holds steady at max and GC does not run. Need to force restart the cluster Elasticsearch elastic-stack-monitoring	5	558	July 6, 2023
java.lang.OutOfMemoryError: Java heap space - GC overhead using visualizations Elasticsearch	16	5199	August 2, 2018

Debugging ConcurrentMarkSweep when not low on memory

Related topics