Elasticsearch nodes doing young generation gc very frequently

Cluster details :
Elasticsearch version : 6.3.0
Java version : 1.8.0_191
54 data nodes
Each BM is split into 2 VMs. Each VM has configuration : 128 GB RAM, 31 GB Heap, 18 cores
3 master nodes

Jvm options

-Xms31744m
-Xmx31744m
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:MaxNewSize=16168m 

It's impacting the performance of the cluster badly.

I tried different memory settings for young generation ranging from 1 GB to 16 of GB heap.
With all the settings, I see garbage collection being triggered every sec

[2019-01-10T11:02:25,733][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][53][6] duration [1s], collections [1]/[1.6s], total [1s]/[2.4s], memory [16.1gb]->[6gb]/[29.4gb], all_pools {[young] [11.6gb]->[617.2mb]/[12.6gb]}{[survivor] [618.9mb]->[1.5gb]/[1.5gb]}{[old] [3.8gb]->[3.8gb]/[15.2gb]}
[2019-01-10T11:02:25,735][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][53] overhead, spent [1s] collecting in the last [1.6s]
[2019-01-10T11:02:33,981][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][59][8] duration [2.6s], collections [2]/[3.2s], total [2.6s]/[5s], memory [16.5gb]->[7.3gb]/[29.4gb], all_pools {[young] [11gb]->[147.3mb]/[12.6gb]}{[survivor] [1.5gb]->[406.2mb]/[1.5gb]}{[old] [3.8gb]->[6.8gb]/[15.2gb]}
[2019-01-10T11:02:33,997][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][59] overhead, spent [2.6s] collecting in the last [3.2s]
[2019-01-10T11:02:46,927][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][71][10] duration [906ms], collections [1]/[1.8s], total [906ms]/[6.1s], memory [16.7gb]->[9.8gb]/[29.4gb], all_pools {[young] [8.4gb]->[103.3mb]/[12.6gb]}{[survivor] [1.3gb]->[1.5gb]/[1.5gb]}{[old] [6.8gb]->[8.1gb]/[15.2gb]}
[2019-01-10T11:02:46,930][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][71] overhead, spent [906ms] collecting in the last [1.8s]
[2019-01-10T11:02:58,339][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][82][11] duration [1.2s], collections [1]/[1.4s], total [1.2s]/[7.4s], memory [21.9gb]->[11gb]/[29.4gb], all_pools {[young] [12.1gb]->[126mb]/[12.6gb]}{[survivor] [1.5gb]->[1.4gb]/[1.5gb]}{[old] [8.1gb]->[9.4gb]/[15.2gb]}
[2019-01-10T11:02:58,341][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][82] overhead, spent [1.2s] collecting in the last [1.4s]
[2019-01-10T11:03:13,347][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][97] overhead, spent [259ms] collecting in the last [1s]
[2019-01-10T11:03:24,163][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][107][13] duration [1.5s], collections [1]/[1.8s], total [1.5s]/[9.1s], memory [22.6gb]->[10.9gb]/[29.4gb], all_pools {[young] [12gb]->[81.5mb]/[12.6gb]}{[survivor] [1.1gb]->[915.9mb]/[1.5gb]}{[old] [9.4gb]->[9.9gb]/[15.2gb]}
[2019-01-10T11:03:24,164][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][107] overhead, spent [1.5s] collecting in the last [1.8s]
[2019-01-10T11:03:31,384][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][114] overhead, spent [399ms] collecting in the last [1.2s]
[2019-01-10T11:04:27,553][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][170] overhead, spent [657ms] collecting in the last [1s]
[2019-01-10T11:04:42,564][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][185] overhead, spent [273ms] collecting in the last [1s]
[2019-01-10T11:04:50,847][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][193][19] duration [1s], collections [1]/[1.2s], total [1s]/[11.8s], memory [23gb]->[10.7gb]/[29.4gb], all_pools {[young] [12.5gb]->[248.6mb]/[12.6gb]}{[survivor] [418.2mb]->[468.9mb]/[1.5gb]}{[old] [10gb]->[10gb]/[15.2gb]}
[2019-01-10T11:04:50,851][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][193] overhead, spent [1s] collecting in the last [1.2s]
[2019-01-10T11:05:15,877][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][218] overhead, spent [322ms] collecting in the last [1s]
[2019-01-10T11:05:44,959][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][247][22] duration [957ms], collections [1]/[1s], total [957ms]/[13.1s], memory [22.9gb]->[10.6gb]/[29.4gb], all_pools {[young] [12.4gb]->[225.1mb]/[12.6gb]}{[survivor] [437.5mb]->[379.6mb]/[1.5gb]}{[old] [10gb]->[10gb]/[15.2gb]}

Kindly suggest what needs to be fixed for better performance

@elastic Kindly suggest, our cluster performance has degraded and lag is building up in our ingestion pipelines

@davidkarlsen @Badger I see you guys have faced similar issues in the past. It would be really helpful if you can help us out here

Could this be an issue because of 2 VMs on 1 BM?

Thanks,
Nitish

What is the full output of the cluster stats API?

Overall you appear to spend about 11 seconds in GC over three and half minutes. It's high, but not outrageous. I doubt GC is the centre of your problems.

Do you have a rationale for using -XX:CMSInitiatingOccupancyFraction=75? If not, then remove it. It just means you can only use 3/4 of your heap.

1 Like

Please don't ping people like that. Most users here are community based volunteers.

Output of cluster stats API


* _nodes: {
  * total: 62,

  * successful: 62,

  * failed: 0},

* cluster_name: "###",

* timestamp: 1547181459080,

* status: "green",

* indices: {
  * count: 2647,

  * shards: {
    * total: 14178,

    * primaries: 7095,

    * replication: 0.99830866807611,

    * index: {
      * shards: {
        * min: 2,

        * max: 40,

        * avg: 5.356252361163581},

      * primaries: {
        * min: 1,

        * max: 20,

        * avg: 2.680392897619947},

      * replication: {
        * min: 0,

        * max: 1,

        * avg: 0.9977332829618436}}},

  * docs: {
    * count: 45373165921,

    * deleted: 274292654},

  * store: {
    * size_in_bytes: 115342384533402},

  * fielddata: {
    * memory_size_in_bytes: 29848384,

    * evictions: 0},

  * query_cache: {
    * memory_size_in_bytes: 0,

    * total_count: 0,

    * hit_count: 0,

    * miss_count: 0,

    * cache_size: 0,

    * cache_count: 0,

    * evictions: 0},

  * completion: {
    * size_in_bytes: 0},

  * segments: {
    * count: 279800,

    * memory_in_bytes: 344827997472,

    * terms_memory_in_bytes: 324454122049,

    * stored_fields_memory_in_bytes: 10918551272,

    * term_vectors_memory_in_bytes: 0,

    * norms_memory_in_bytes: 2593041856,

    * points_memory_in_bytes: 5337342615,

    * doc_values_memory_in_bytes: 1524939680,

    * index_writer_memory_in_bytes: 4602154631,

    * version_map_memory_in_bytes: 14059924,

    * fixed_bit_set_memory_in_bytes: 0,

    * max_unsafe_auto_id_timestamp: -1,

    * file_sizes: { }}},

* nodes: {
  * count: {
    * total: 62,

    * data: 54,

    * coordinating_only: 0,

    * master: 3,

    * ingest: 62},

  * versions: [
    * "6.3.0"],

  * os: {
    * available_processors: 1144,

    * allocated_processors: 1144,

    * names: [
      * {
        * name: "Linux",

        * count: 62}],

    * mem: {
      * total_in_bytes: 5514092630016,

      * free_in_bytes: 609798819840,

      * used_in_bytes: 4904293810176,

      * free_percent: 11,

      * used_percent: 89}},

  * process: {
    * cpu: {
      * percent: 363},

    * open_file_descriptors: {
      * min: 1635,

      * max: 4105,

      * avg: 3102}},

  * jvm: {
    * max_uptime_in_millis: 11555411240,

    * versions: [
      * {
        * version: "1.8.0_181",

        * vm_name: "Java HotSpot(TM) 64-Bit Server VM",

        * vm_version: "25.181-b13",

        * vm_vendor: "Oracle Corporation",

        * count: 8},

      * {
        * version: "1.8.0_191",

        * vm_name: "Java HotSpot(TM) 64-Bit Server VM",

        * vm_version: "25.191-b12",

        * vm_vendor: "Oracle Corporation",

        * count: 54}],

    * mem: {
      * heap_used_in_bytes: 744203944496,

      * heap_max_in_bytes: 1928486518784},

    * threads: 11634},

  * fs: {
    * total_in_bytes: 345595116142592,

    * free_in_bytes: 230171333484544,

    * available_in_bytes: 230171333484544},

  * plugins: [ ],

  * network_types: {
    * transport_types: {
      * security4: 62},

    * http_types: {
      * security4: 62}}}

}

@elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.