Miracle G1 settings for 30GB heaps

Ok, maybe not miracle, but it made you look. :smile:

I'm running this version of Java:

java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

I have 30GB heaps on 64GB servers with 16 cores, and a RAID 0 stripe across 4 4TB SATA 7200 disks.
I'm indexing a consistent 30k events/sec via bulk inserts, IOps on the disks range from 100-300.

I was having very frequent young GC collections, even when my heap was only about 30% used. They were anywhere from 1-4 seconds, and happening often enough to really affect the indexing throughput, so I started looking.

I know that in general, when it comes to heap tuning, it's better to just not. The VM does a very good job, in most cases. In my searching, I came across this blog, and tried the settings they suggest, and since I enabled them, I have not seen a young GC DEBUG/INFO/WARN in my ES logs at all. It's been just over 48 hours now.

Maybe it's too early to get this excited, but I wanted to share the settings and get some comments, and hopefully they will help someone else who might be struggling with this as well.

I should also mention that my cluster is made of 13 nodes, with 3 dedicated masters, 4 dedicated clients, and 6 dedicated data nodes.

Here are the settings I'm using:

/usr/bin/java -Xms30g -Xmx30g -Xss256k 
-Djava.awt.headless=true -server 
-XX:+UseCompressedOops 
-XX:+UseG1GC 
-XX:MaxGCPauseMillis=20 
-XX:+DisableExplicitGC 
-verbose:gc 
-Xloggc:/var/log/elasticsearch/gc.log 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps 
-XX:+HeapDumpOnOutOfMemoryError (snip)

The only one I left out was the XX:G1NewSizePercent=3 parameter, because apparently it's not valid anymore, and the VM complained (still started though).

Anyway, enough rambling. Check this out and let me know what you think (yes, I know it says for hbase :slight_smile: )

https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase

Hope it helps.
Chris

If you use G1GC you risk data loss, which is why we don't support it.

That may change, but this is the current state of play.

Thanks Mark.

Could you elaborate on the scenario where that might happen? Or link me to it?

Thank you sir.
Chris

http://wiki.apache.org/lucene-java/JavaBugs

Do not, under any circumstances, run Lucene with the G1 garbage collector. Lucene's test suite fails with the G1 garbage collector on a regular basis, including bugs that cause index corruption. There is no person on this planet that seems to understand such bugs (see Loading..., open for over a year), so don't count on the situation changing soon. This information is not out of date, and don't think that the next oracle java release will fix the situation.

1 Like

Thanks.

Well crap. Guess CMS is the way to go still. Any miracle CMS settings to help me get started, or just the basics?

Chris

Interesting though. In reading the bug history, it says this:

Hi everyone,

I am a committer to the Lucene/Solr project. We've recently hit what
we believe is a JIT/GC bug -- it manifests itself only when G1GC is
used, on a 32-bit VM:

Using Java: 32bit/jdk1.8.0-ea-b102 -server -XX:+UseG1GC
Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC

and later:

and are consistent before and after. jdk1.7.0_04, 64-bit does NOT
exhibit the issue (and neither does any version afterwards, it only
happens on 32-bit; perhaps it's because of smaller number of available
registers and the need to spill?).

Very specific to 32 bit VM, which I am not using. No arguing that G1GC is not officially supported, but perhaps I'm in less risk of index issues than we thought?

Chris

That's a risk evaluation that you need to run.

We, obviously, want our customers to avoid data loss or corruption as much as possible hence our positioning on this.

Understood. :slight_smile: Thanks for all the replies.

Don't panic, if you test for yourself, you can be optimistic.

Lucene committer Uwe Schindler's latest comments on Lucene and G1 can be found here: blog - devmio - Software Know-How

When observing the Lucene builds during recent months, the Lucene team noticed that the errors initially seen no longer occurred. This is also consistent with the statement by Oracle that G1GC is “ready for production” in Java 8 Update 40.

I recommend Java 8u40+, 64 bit, Red Hat Linux (no VM), and G1. Never saw a single crash or data loss because of G1 with that combination.

Thanks for the info Jorg.

We're working to move to Java 8 at the moment, so I'll make sure we go for at least that update.

Chris

G1GC caused exteremely high CPU load on our systems, even though there are no requests on the servers. Never, ever use G1GC with elasticsearch.

I hope the problem will be gone for Java 9.

2 Likes

Some interesting recent development:

The specific bug (JDK-8038348) referenced on https://wiki.apache.org/lucene-java/JavaBugs next to the "Do not, under any circumstances, run Lucene with the G1 garbage collector" remark has now been resolved but nothing has changed in the official documentation and now G1 is set to be the default GC in Java 9.

Please note that the specific corruption issue that you reference was not and is not the only reason to avoid the use of G1 with Elasticsearch. It's just that the corruption issue was a clear reason to never even consider using G1 with Elasticsearch. Any additional discussion until that point was resolved was moot. Issues remain though:

  • the performance tax going from CMS to G1 is quite significant due to the use of more expensive write barriers; this has a substantial impact on throughput
  • G1 has a larger footprint due its remembered sets and collection set

Here is one example of the impact G1 GC had on a cluster: Indexing performance degrading over time

CMS is well-understood, stable and very mature at this point. Switching to G1 GC carries a heavy cost with no apparent benefits.

You are right to point out that G1 will be the default in JDK 9, but that does not mean that it's ready for primetime. In fact, there is plenty of concern in the community that this is not the case.

1 Like

The motivation for G1 is going for low pause garbage collection by avoiding stop-of-the-world phases. Stop-of-the-world phases can take seconds up to minutes on CMS GC. As always, there are tradeoffs, and so is with G1.

The pros of G1 GC are

  • low pause garbage collecting, no stop-of-the world phase
  • low pauses also possible on large heaps (>8 GB)
  • will be supported by Oracle in Java 9+

The cons are

  • less throughput
  • it requires extra CPU cycles, and is advised to run on multi core CPUs
  • does not run perfectly out of the box, it requires intimate GC configuration knowledge (e.g. ‑XX:MaxGCPauseMillis)

There had been concerns in the community, but as JEP-248 JEP 248: Make G1 the Default Garbage Collector states

If a critical issue is found that can't be addressed in the JDK 9 time frame, we will revert back to use Parallel GC as the default for the JDK 9 GA.

The truth is that Oracle has turned away from CMS JEP 291: Deprecate the Concurrent Mark Sweep (CMS) Garbage Collector

To complete the picture, there has been a meeting of engineers from Google/Oracle/Twitter/SAP/jClarity for taking steps to save CMS from getting unsupported/removed from the codebase https://bugs.openjdk.java.net/secure/attachment/64150/cms-meeting-20-sep-2016.html

If anyone wants to rely on CMS during the whole JDK 9 lifecycle and beyond, let's hope that Google/SAP/Twitter/jClarity & Co. can establish methods to keep a supported and improved CMS in OpenJDK 9+.

I'll keep my fingers crossed, because Google has made patches to CMS in their Java version to improve CMS performance because of a severe CMS performance bug Loading... and they hold back other goodies JEP 291: Deprecate the Concurrent Mark Sweep (CMS) Garbage Collector

Working on the issue of long GC pauses is a hard challenge, and G1 is not the only project to tackle this. Another GC project is Shenandoah, which addresses even larger heaps of hundreds of GB, but will not be included in JDK 9 Private Site

There are also issues outside the JVM can also destroy JVM GC performance, no matter what GC is running, like /tmp being mounted on a non-tmpfs or non-tuned filesystem, which may cause Linux to block I/O for milliseconds when JVM performance stats are written to hsperfdata files because of mtime calls: Loading...

With regard to configuration findings for Elasticsearch, trust your own infrastructure, not other environments. Be open for all kinds of issues from whatever source they may come. Take metrics under your workload on your machines, measure latency, throughput, request/response times, and choose wisely.

Thanks for the thoughtful reply @jprante.

Note that I said no "apparent" benefits, not no "claimed" benefits. Yes, the main claimed benefit of G1 is predictable garbage collection pauses. The problem is that to achieve that there is a substantial drop in throughput. For Elasticsearch, this drop in throughput translates into lower indexing rates, and higher latency serving requests. Thus, the trade here is predictable pause times for an overall worse situation; hence, no apparent benefits.

Sadly, G1 GC is a whole lot more complicated to tune than just the single knob -XX:+MaxGCPauseMillis. For example, there is the complexity of a large object allocations, a potential issue for clusters executing bulk requests with large payloads.