Using G1 with Elasticsearch

shaunak · May 1, 2015, 10:57am

Background

Oracle introduced the Garbage First (G1) Garbage Collector (G1GC) in Java 7 as an alternative to the Concurrent Mark-Sweep (CMS) Garbage Collector and used by Elasticsearch. G1 is intended for server-side applications like Elasticsearch (which uses CMS by default) running on the JVM that tend to run with significantly large heaps.

When CMS was written, not many applications were taking advantage of nearly as much Java Heap Space as many applications now take for granted. As a result, CMS is often times not able to appropriately cope with the large heaps of modern applications. This can result in long GC-related "stop the world" pauses that, in the case of Elasticsearch with very large heaps, could potentially cause a Node to drop in and out of a Cluster.

With any application in this scenario, this can become a problem in production when it is to late to recover because of the long pauses. And this is exactly the problem that G1GC hopes to solve.

Oracle has supported G1 since Java 7 update 4 and they continue to support it as their non-standard garbage collector for server applications with large heaps. In short, G1 was designed to avoid the long pauses (or "interruptions" as they refer to them in the linked description of G1GC above) that currently occur with the default garbage collector.

So why not switch to G1?

Unfortunately, the story is not so blindly one-sided as it seems. While Oracle has stated that the future of G1 is to replace CMS, they have not yet made the switch.

At Elasticsearch, it is our belief that the reason G1 is not the default garbage collector is because it is not ready for production use. We do not take this position lightly, rather we have taken this position because of Elasticsearch's test framework as well as Apache Lucene's test framework.

Both projects are separately and constantly built running multiple versions of Java to ensure compatibility. Critically, both projects run some of their builds with G1 as the enabled garbage collector. Due to that, we use these builds as a barometer to determine if we are able to claim to support that variant.

Example Issues

Through our build environments, we have witnessed multiple, hard-to-repeat issues that only appear when running G1GC including:

Segmentation Faults
Unexpected Hangs
Unexpected Failures (this one actually reproduced)

OpenJDK Bugs reported via the above testing framework:

OpenJDK Bug Tracking (from Elasticsearch/Lucene builds)
OpenJDK G1 Bug Tracking (new crashes reported in November, 2014)

JDK releases notes continuously have many ominous related fixes (search for "G1"):

Raw JDK 8 Changesets:

Hotspot (getting less frequent!)

As with any software, not all issues are created equal, but there are still too many to safely endorse it.

Do not use G1 yet, but maybe soon

We are very excited with the prospect of G1 and we are even aware of multiple clusters that successfully run with G1. Even so, we still cannot safely condone usage of G1 because of the failures that we are aware of, in spite of those successes.

As new Java updates are released, we are constantly revisiting our position on G1 and we ere excited to notice less instability with each release. For that reason, we look forward to the day that we can suggest that our users switch to using G1. That is just not today.

faitlezen · June 11, 2015, 9:51pm

Anyone have played with ElasticSearch 1.6, jdk 1.8 and G1 ? Is it still not recommened ?

mvleandro · June 22, 2017, 12:18pm

What is the current position of Elastic about to use G1 collector?

New tests were made?

How many time have you runned your successful tests?

If my cluster configuration and environment are equal to your success clusters, why can i not use G1?

Topic		Replies	Views
G1 Garbage Collector with Elasticsearch >= 1.1 Elasticsearch	2	493	July 6, 2017
Where can I find info on the status of G1GC in Lucene and ES? Elasticsearch	8	6650	November 27, 2018
5.x: Garbage Collection Recommendations Elasticsearch	2	2080	February 17, 2017
Warning regarding jvm option UseConcMarkSweepGC was deprecated Elasticsearch	6	9987	September 1, 2020
GC Settings for Elasticsearch Elasticsearch	10	5100	July 6, 2017

Using G1 with Elasticsearch

Background

So why not switch to G1?

Example Issues

Do not use G1 yet, but maybe soon

Related topics