We have a 4 node cluster with "heap_size": "29500m", and we're consistently seeing heap usage go over 80% for hours, reaching up to 99%, and no garbage collection takes place. Does anyone know why GC would not be taking place? Our current solution is to restart the elasticsearch service for each node, and then heap usage goes back to normal.
Side Note: We haven't been able to take a heap dump when usage is this high because we need this cluster to be available.
please always include version information and ideally also your relevant configuration (i.e. which garbage collector are you using?). Assuming that you have garbage collection logs configured (they're on by default since Elasticsearch 6.2.0) I suggest you inspect the GC logs. In the standard configuration the garbage collector is triggered at 75% of heap usage.
GC logs are configured in config/jvm.options and the jvm.options that you've mentioned in your post above don't include any GC-related settings. Out of the box, Elasticsearch uses the CMS garbage collector. Can you please paste the output of the nodes info API?
This means that the CMS garbage collector is in use. This is because you're running on JDK 8 and it is the default garbage collector there. The provided JVM options are (I've omitted any system properties):
This is very minimal (even the GC is unspecified) so it appears to me you have a very non-standard jvm.options file. I suggest you compare your jvm.options file with Elasticsearch's default jvm.options file for version 6.6 and add any missing lines. Note that that file is only a template, so you need to ensure to replace any placeholders (e.g. ${heap.dump.path}) with proper paths on your system (see the comments in the file for guidance).
After you've done this, you should have:
Correct system properties set (you're missing several of them, e.g. -Dio.netty.noUnsafe=true)
JVM options are set so you will have garbage collection logs and also the garbage collector is explicitly configured instead of implicitly chosen by the JVM (this could bite you when upgrading the JVM).
If in doubt you can always share any files or output privately.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.