Getting gc and node disconnected from the cluster

we are getting GC like below, present in es log

[2024-09-03T22:53:06,159][WARN ][o.e.m.j.JvmGcMonitorService] [es7-at-13] [gc][1188424] overhead, spent [1.4s] collecting in the last [2.1s]

and we are facing situation like node disconnected from the cluster and in some instances it is not added to the cluster

we are using the CMS GC and our es version 7.17.x with java oracle jdk 8

it is same with dedicated data nodes also and we are seeing that with about 50-60% of heap also we are getting this GC issue
instead we have provided 16-18 GB of heap on the node with dedicated data node role

please advise on this

node details - covers os, jvm & GC and its params used

"os": {
        "refresh_interval_in_millis": 1000,
        "name": "Linux",
        "pretty_name": "Ubuntu 20.04.5 LTS",
        "arch": "amd64",
        "version": "5.15.0-1066-aws",
        "available_processors": 4,
        "allocated_processors": 4
      },
      "process": {
        "refresh_interval_in_millis": 1000,
        "id": 1113915,
        "mlockall": true
      },
      "jvm": {
        "pid": 1113915,
        "version": "1.8.0_131",
        "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version": "25.131-b11",
        "vm_vendor": "Oracle Corporation",
        "bundled_jdk": true,
        "using_bundled_jdk": false,
        "start_time_in_millis": 1725283552783,
        "mem": {
          "heap_init_in_bytes": 6442450944,
          "heap_max_in_bytes": 6174015488,
          "non_heap_init_in_bytes": 2555904,
          "non_heap_max_in_bytes": 0,
          "direct_max_in_bytes": 3221225472
        },
        "gc_collectors": [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools": [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers": "true",
        "input_arguments": [
          "-Xshare:auto",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dio.netty.allocator.numDirectArenas=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Dlog4j2.formatMsgNoLookups=true",
          "-Djava.locale.providers=SPI,JRE",
          "-Djava.io.tmpdir=/tmp/elasticsearch-2557950945095531367",
          "-XX:+HeapDumpOnOutOfMemoryError",
          "-XX:HeapDumpPath=data",
          "-XX:ErrorFile=logs/hs_err_pid%p.log",
          "-XX:+PrintGCDetails",
          "-XX:+PrintGCDateStamps",
          "-XX:+PrintTenuringDistribution",
          "-XX:+PrintGCApplicationStoppedTime",
          "-Xloggc:logs/gc.log",
          "-XX:+UseGCLogFileRotation",
          "-XX:NumberOfGCLogFiles=32",
          "-XX:GCLogFileSize=64m",
          "-XX:HeapDumpPath=/u01/elasticsearch/logs/es717/es7-uae-at/es7-at-11/es7-at-11-heapdump.hprof",
          "-Dlog4j2.formatMsgNoLookups=true",
          "-Xloggc:/u01/elasticsearch/logs/es717/es7-uae-at/es7-at-11/gc.log",
          "-XX:+UseConcMarkSweepGC",
          "-Xms6144m",
          "-Xmx6144m",
          "-XX:+UseParNewGC",
          "-XX:CMSInitiatingOccupancyFraction=75",
          "-XX:+UseCMSInitiatingOccupancyOnly",
          "-XX:+PrintGCDetails",
          "-verbose:gc",
          "-XX:+PrintGCTimeStamps",
          "-XX:NewSize=2048m",
          "-XX:MaxNewSize=2048m",
          "-XX:MaxTenuringThreshold=6",
          "-XX:+PrintTenuringDistribution",
          "-XX:SurvivorRatio=6",
          "-XX:+CMSClassUnloadingEnabled",
          "-XX:MaxDirectMemorySize=3221225472",
          "-Des.path.home=/u01/elasticsearch/clusters/es717/es7-uae-at/es7-at-11",
          "-Des.path.conf=/u01/elasticsearch/clusters/es717/es7-uae-at/es7-at-11/config",
          "-Des.distribution.flavor=default",
          "-Des.distribution.type=tar",
          "-Des.bundled_jdk=true"
        ]
      }

What is the full output of the cluster stats API?

Why are you not using the bundled JVM, which uses G1GC instead of CMS?

How much RAM does the host have? Make sure to not allocate over 50% of RAM to heap. Do you have any graph of heap usage on these nodes?

cluster stats

{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "8888888888",
  "cluster_uuid": "8888888888",
  "timestamp": 1725432952895,
  "status": "green",
  "indices": {
    "count": 42,
    "shards": {
      "total": 372,
      "primaries": 186,
      "replication": 1.0,
      "index": {
        "shards": {
          "min": 2,
          "max": 10,
          "avg": 8.857142857142858
        },
        "primaries": {
          "min": 1,
          "max": 5,
          "avg": 4.428571428571429
        },
        "replication": {
          "min": 1.0,
          "max": 1.0,
          "avg": 1.0
        }
      }
    },
    "docs": {
      "count": 12073928,
      "deleted": 39315
    },
    "store": {
      "size_in_bytes": 6339484982,
      "total_data_set_size_in_bytes": 6339484982,
      "reserved_in_bytes": 0
    },
    "fielddata": {
      "memory_size_in_bytes": 0,
      "evictions": 0
    },
    "query_cache": {
      "memory_size_in_bytes": 0,
      "total_count": 14124,
      "hit_count": 94,
      "miss_count": 14030,
      "cache_size": 0,
      "cache_count": 1467,
      "evictions": 1467
    },
    "completion": {
      "size_in_bytes": 0
    },
    "segments": {
      "count": 1410,
      "memory_in_bytes": 6195888,
      "terms_memory_in_bytes": 4041256,
      "stored_fields_memory_in_bytes": 718320,
      "term_vectors_memory_in_bytes": 0,
      "norms_memory_in_bytes": 7296,
      "points_memory_in_bytes": 0,
      "doc_values_memory_in_bytes": 1429016,
      "index_writer_memory_in_bytes": 3925832,
      "version_map_memory_in_bytes": 12687,
      "fixed_bit_set_memory_in_bytes": 0,
      "max_unsafe_auto_id_timestamp": 1725429467089,
      "file_sizes": {
        
      }
    },
    "mappings": {
      "field_types": [
        {
          "name": "boolean",
          "count": 2,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "constant_keyword",
          "count": 6,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "date",
          "count": 130,
          "index_count": 37,
          "script_count": 0
        },
        {
          "name": "double",
          "count": 20,
          "index_count": 7,
          "script_count": 0
        },
        {
          "name": "integer",
          "count": 4,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "ip",
          "count": 2,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "keyword",
          "count": 796,
          "index_count": 41,
          "script_count": 0
        },
        {
          "name": "long",
          "count": 36,
          "index_count": 17,
          "script_count": 0
        },
        {
          "name": "nested",
          "count": 25,
          "index_count": 3,
          "script_count": 0
        },
        {
          "name": "object",
          "count": 26,
          "index_count": 4,
          "script_count": 0
        },
        {
          "name": "text",
          "count": 80,
          "index_count": 11,
          "script_count": 0
        }
      ],
      "runtime_field_types": [
        
      ]
    },
    "analysis": {
      "char_filter_types": [
        
      ],
      "tokenizer_types": [
        {
          "name": "standard",
          "count": 36,
          "index_count": 36
        },
        {
          "name": "whitespace",
          "count": 36,
          "index_count": 36
        }
      ],
      "filter_types": [
        {
          "name": "asciifolding",
          "count": 30,
          "index_count": 30
        },
        {
          "name": "length",
          "count": 36,
          "index_count": 36
        },
        {
          "name": "lowercase",
          "count": 36,
          "index_count": 36
        },
        {
          "name": "word_delimiter",
          "count": 36,
          "index_count": 36
        }
      ],
      "analyzer_types": [
        {
          "name": "custom",
          "count": 288,
          "index_count": 36
        }
      ],
      "built_in_char_filters": [
        
      ],
      "built_in_tokenizers": [
        {
          "name": "keyword",
          "count": 72,
          "index_count": 36
        },
        {
          "name": "standard",
          "count": 72,
          "index_count": 36
        },
        {
          "name": "uax_url_email",
          "count": 72,
          "index_count": 36
        }
      ],
      "built_in_filters": [
        {
          "name": "lowercase",
          "count": 180,
          "index_count": 36
        },
        {
          "name": "unique",
          "count": 36,
          "index_count": 36
        }
      ],
      "built_in_analyzers": [
        
      ]
    },
    "versions": [
      {
        "version": "7.17.20",
        "index_count": 42,
        "primary_shard_count": 186,
        "total_primary_bytes": 3158084674
      }
    ]
  },
  "nodes": {
    "count": {
      "total": 3,
      "coordinating_only": 0,
      "data": 3,
      "data_cold": 0,
      "data_content": 0,
      "data_frozen": 0,
      "data_hot": 0,
      "data_warm": 0,
      "ingest": 0,
      "master": 3,
      "ml": 0,
      "remote_cluster_client": 0,
      "transform": 0,
      "voting_only": 0
    },
    "versions": [
      "7.17.20"
    ],
    "os": {
      "available_processors": 12,
      "allocated_processors": 12,
      "names": [
        {
          "name": "Linux",
          "count": 3
        }
      ],
      "pretty_names": [
        {
          "pretty_name": "Ubuntu 20.04.5 LTS",
          "count": 3
        }
      ],
      "architectures": [
        {
          "arch": "amd64",
          "count": 3
        }
      ],
      "mem": {
        "total_in_bytes": 99497824256,
        "free_in_bytes": 7581229056,
        "used_in_bytes": 91916595200,
        "free_percent": 8,
        "used_percent": 92
      }
    },
    "process": {
      "cpu": {
        "percent": 1
      },
      "open_file_descriptors": {
        "min": 1111,
        "max": 1133,
        "avg": 1123
      }
    },
    "jvm": {
      "max_uptime_in_millis": 1192377402,
      "versions": [
        {
          "version": "1.8.0_131",
          "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version": "25.131-b11",
          "vm_vendor": "Oracle Corporation",
          "bundled_jdk": true,
          "using_bundled_jdk": false,
          "count": 3
        }
      ],
      "mem": {
        "heap_used_in_bytes": 3336681464,
        "heap_max_in_bytes": 18522046464
      },
      "threads": 266
    },
    "fs": {
      "total_in_bytes": 160994439168,
      "free_in_bytes": 118339858432,
      "available_in_bytes": 118339858432
    },
    "plugins": [
      {
        "name": "repository-s3",
        "version": "7.17.20",
        "elasticsearch_version": "7.17.20",
        "java_version": "1.8",
        "description": "The S3 repository plugin adds S3 repositories",
        "classname": "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
        "extended_plugins": [
          
        ],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "repository-gcs",
        "version": "7.17.20",
        "elasticsearch_version": "7.17.20",
        "java_version": "1.8",
        "description": "The GCS repository plugin adds Google Cloud Storage support for repositories.",
        "classname": "org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin",
        "extended_plugins": [
          
        ],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      }
    ],
    "network_types": {
      "transport_types": {
        "security4": 3
      },
      "http_types": {
        "security4": 3
      }
    },
    "discovery_types": {
      "zen": 3
    },
    "packaging_types": [
      {
        "flavor": "default",
        "type": "tar",
        "count": 3
      }
    ],
    "ingest": {
      "number_of_pipelines": 2,
      "processor_stats": {
        "gsub": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "script": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        }
      }
    }
  }
}

we have been using oracle jdk 8 since from version 7.3.2 release with CMS GC

We have maintained 50% free ram in our main production servers
but even on this we are facing continuous GC like above mentioned log

Screenshot from 2024-09-04 12-35-15

attached screenshot is one of the recent gc scenario

I would recommend you switch to the bundled JVM and see if it with the default settings change anything. It is after all what Elasticsearch is tested with.