How to avoid incomplete garbage collection by G1GC causing high cpu usage on Elasticsearch nodes?

This was an issue I encountered before:

I asked the same question on another forum, and someone replied as follows:
In the ThreadLocal's ThreadLocalMap, there is a table. Excessive usage can result in most Entry elements in the table pointing to null, making it difficult to find a position for setting or adding objects. Additionally, the Entry objects in the table are weak references, relying on JVM GC. If using G1GC, the GC only operates on a portion of the memory area, potentially not reaching this problematic ThreadLocal for a long time, requiring a Full GC to release it immediately.

Later, I confirmed that triggering a "jmap -histo:live " to perform a Full GC can restore normal cpu behavior on the node. After changing from G1GC to CMS, the issue no longer occurred.
Is this an Elasticsearch bug? Besides changing the GC method, are there any other ways to address this problem?

The issue you linked to uses a very old version of Elasticsearch that has been EOL a long, long time. GIGC was never supported for this version, so if you are using that you are in unsupported territory. I would recommend you upgrade to the latest version, where GIGC is officially supported and tested.

This was an issue I encountered on a cluster running ES version 5.6.3, and later faced the same problem on a cluster with ES version 7.5.2. The issue was also resolved by switching from G1GC to CMS. I suspect that this problem may still exist in the latest ES version 7.

I would recommend upgrading at least to version 7.17 and see if there are any issues when you use the default settings. Version 7.5.2 is also quite old.

Has anyone else reported similar issues? Can we confirm if it is a bug in ES or JDK? Would upgrading to a certain version of ES or JDK resolve it?

Which JVM version are you using with your 7.5.2 installation? I do not remember exactly when G1GC became officially supported, but recall it required an at the time new JVM (possibly Java10 or Java11 ?) in order to be supported. G1GC was made the default later in the ES 7 series. If you upgrade to version 7.17 you should have a supported JVM bundled and G1GC set up by default if I remember correctly.

1 Like

use the JVM bundled:

openjdk version "13.0.1" 2019-10-15
OpenJDK Runtime Environment AdoptOpenJDK (build 13.0.1+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13.0.1+9, mixed mode, sharing)