Heap usage holds steady at max and GC does not run. Need to force restart the cluster

Hey Christian, thank you for the quick reply. Looking at our configuration, we noticed we had none of the GC configurations set up.

GET /_nodes/jvm?pretty showed

   "input_arguments" : [
          "-Xshare:auto",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-XX:+ShowCodeDetailsInExceptionMessages",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dio.netty.allocator.numDirectArenas=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Djava.locale.providers=SPI,COMPAT",
          "-Xms28g",
          "-Xmx28g"
]

We went ahead and copied the default configs from elasticsearch/jvm.options at 7.10 · elastic/elasticsearch · GitHub

Our JVM configs are now the following, and the cluster seems to be doing much better.

   "input_arguments" : [
          "-Xshare:auto",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-XX:+ShowCodeDetailsInExceptionMessages",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dio.netty.allocator.numDirectArenas=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Djava.locale.providers=SPI,COMPAT",
          "-Xms28g",
          "-Xmx28g",
          "-XX:+UseG1GC",
          "-XX:G1ReservePercent=25",
          "-XX:InitiatingHeapOccupancyPercent=30",
          "-XX:+HeapDumpOnOutOfMemoryError",
          "-XX:+HeapDumpBeforeFullGC",
          "-XX:HeapDumpPath=/var/log/elasticsearch",
          "-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m",
          "-XX:MaxDirectMemorySize=15032385536",
          "-Des.path.home=/usr/share/elasticsearch",
          "-Des.path.conf=/etc/elasticsearch",
          "-Des.distribution.flavor=default",
          "-Des.distribution.type=rpm",
          "-Des.bundled_jdk=true"
        ]

Looking at the convo in Garbage Collection Not Working - #5 by danielmitterdorfer

I imagine our cluster was running with a GC implicitly chosen by the JVM, which was clearly not running as needed.

Again, thanks for the help!