Elasticsearch nodes doing young generation gc very frequently

Nitish_Goyal · January 10, 2019, 6:28am

Cluster details :
Elasticsearch version : 6.3.0
Java version : 1.8.0_191
54 data nodes
Each BM is split into 2 VMs. Each VM has configuration : 128 GB RAM, 31 GB Heap, 18 cores
3 master nodes

Jvm options

-Xms31744m
-Xmx31744m
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:MaxNewSize=16168m

It's impacting the performance of the cluster badly.

I tried different memory settings for young generation ranging from 1 GB to 16 of GB heap.
With all the settings, I see garbage collection being triggered every sec

[2019-01-10T11:02:25,733][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][53][6] duration [1s], collections [1]/[1.6s], total [1s]/[2.4s], memory [16.1gb]->[6gb]/[29.4gb], all_pools {[young] [11.6gb]->[617.2mb]/[12.6gb]}{[survivor] [618.9mb]->[1.5gb]/[1.5gb]}{[old] [3.8gb]->[3.8gb]/[15.2gb]}
[2019-01-10T11:02:25,735][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][53] overhead, spent [1s] collecting in the last [1.6s]
[2019-01-10T11:02:33,981][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][59][8] duration [2.6s], collections [2]/[3.2s], total [2.6s]/[5s], memory [16.5gb]->[7.3gb]/[29.4gb], all_pools {[young] [11gb]->[147.3mb]/[12.6gb]}{[survivor] [1.5gb]->[406.2mb]/[1.5gb]}{[old] [3.8gb]->[6.8gb]/[15.2gb]}
[2019-01-10T11:02:33,997][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][59] overhead, spent [2.6s] collecting in the last [3.2s]
[2019-01-10T11:02:46,927][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][71][10] duration [906ms], collections [1]/[1.8s], total [906ms]/[6.1s], memory [16.7gb]->[9.8gb]/[29.4gb], all_pools {[young] [8.4gb]->[103.3mb]/[12.6gb]}{[survivor] [1.3gb]->[1.5gb]/[1.5gb]}{[old] [6.8gb]->[8.1gb]/[15.2gb]}
[2019-01-10T11:02:46,930][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][71] overhead, spent [906ms] collecting in the last [1.8s]
[2019-01-10T11:02:58,339][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][82][11] duration [1.2s], collections [1]/[1.4s], total [1.2s]/[7.4s], memory [21.9gb]->[11gb]/[29.4gb], all_pools {[young] [12.1gb]->[126mb]/[12.6gb]}{[survivor] [1.5gb]->[1.4gb]/[1.5gb]}{[old] [8.1gb]->[9.4gb]/[15.2gb]}
[2019-01-10T11:02:58,341][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][82] overhead, spent [1.2s] collecting in the last [1.4s]
[2019-01-10T11:03:13,347][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][97] overhead, spent [259ms] collecting in the last [1s]
[2019-01-10T11:03:24,163][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][107][13] duration [1.5s], collections [1]/[1.8s], total [1.5s]/[9.1s], memory [22.6gb]->[10.9gb]/[29.4gb], all_pools {[young] [12gb]->[81.5mb]/[12.6gb]}{[survivor] [1.1gb]->[915.9mb]/[1.5gb]}{[old] [9.4gb]->[9.9gb]/[15.2gb]}
[2019-01-10T11:03:24,164][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][107] overhead, spent [1.5s] collecting in the last [1.8s]
[2019-01-10T11:03:31,384][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][114] overhead, spent [399ms] collecting in the last [1.2s]
[2019-01-10T11:04:27,553][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][170] overhead, spent [657ms] collecting in the last [1s]
[2019-01-10T11:04:42,564][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][185] overhead, spent [273ms] collecting in the last [1s]
[2019-01-10T11:04:50,847][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][193][19] duration [1s], collections [1]/[1.2s], total [1s]/[11.8s], memory [23gb]->[10.7gb]/[29.4gb], all_pools {[young] [12.5gb]->[248.6mb]/[12.6gb]}{[survivor] [418.2mb]->[468.9mb]/[1.5gb]}{[old] [10gb]->[10gb]/[15.2gb]}
[2019-01-10T11:04:50,851][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][193] overhead, spent [1s] collecting in the last [1.2s]
[2019-01-10T11:05:15,877][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][218] overhead, spent [322ms] collecting in the last [1s]
[2019-01-10T11:05:44,959][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][247][22] duration [957ms], collections [1]/[1s], total [957ms]/[13.1s], memory [22.9gb]->[10.6gb]/[29.4gb], all_pools {[young] [12.4gb]->[225.1mb]/[12.6gb]}{[survivor] [437.5mb]->[379.6mb]/[1.5gb]}{[old] [10gb]->[10gb]/[15.2gb]}

Kindly suggest what needs to be fixed for better performance

Nitish_Goyal · January 10, 2019, 10:36am

@elastic Kindly suggest, our cluster performance has degraded and lag is building up in our ingestion pipelines

@davidkarlsen @Badger I see you guys have faced similar issues in the past. It would be really helpful if you can help us out here

Could this be an issue because of 2 VMs on 1 BM?

Thanks,
Nitish

Christian_Dahlqvist · January 10, 2019, 11:17am

What is the full output of the cluster stats API?

Badger · January 10, 2019, 2:15pm

Overall you appear to spend about 11 seconds in GC over three and half minutes. It's high, but not outrageous. I doubt GC is the centre of your problems.

Do you have a rationale for using -XX:CMSInitiatingOccupancyFraction=75? If not, then remove it. It just means you can only use 3/4 of your heap.

warkolm · January 10, 2019, 8:21pm

Please don't ping people like that. Most users here are community based volunteers.

Nitish_Goyal · January 11, 2019, 4:39am

Output of cluster stats API


* _nodes: {
  * total: 62,

  * successful: 62,

  * failed: 0},

* cluster_name: "###",

* timestamp: 1547181459080,

* status: "green",

* indices: {
  * count: 2647,

  * shards: {
    * total: 14178,

    * primaries: 7095,

    * replication: 0.99830866807611,

    * index: {
      * shards: {
        * min: 2,

        * max: 40,

        * avg: 5.356252361163581},

      * primaries: {
        * min: 1,

        * max: 20,

        * avg: 2.680392897619947},

      * replication: {
        * min: 0,

        * max: 1,

        * avg: 0.9977332829618436}}},

  * docs: {
    * count: 45373165921,

    * deleted: 274292654},

  * store: {
    * size_in_bytes: 115342384533402},

  * fielddata: {
    * memory_size_in_bytes: 29848384,

    * evictions: 0},

  * query_cache: {
    * memory_size_in_bytes: 0,

    * total_count: 0,

    * hit_count: 0,

    * miss_count: 0,

    * cache_size: 0,

    * cache_count: 0,

    * evictions: 0},

  * completion: {
    * size_in_bytes: 0},

  * segments: {
    * count: 279800,

    * memory_in_bytes: 344827997472,

    * terms_memory_in_bytes: 324454122049,

    * stored_fields_memory_in_bytes: 10918551272,

    * term_vectors_memory_in_bytes: 0,

    * norms_memory_in_bytes: 2593041856,

    * points_memory_in_bytes: 5337342615,

    * doc_values_memory_in_bytes: 1524939680,

    * index_writer_memory_in_bytes: 4602154631,

    * version_map_memory_in_bytes: 14059924,

    * fixed_bit_set_memory_in_bytes: 0,

    * max_unsafe_auto_id_timestamp: -1,

    * file_sizes: { }}},

* nodes: {
  * count: {
    * total: 62,

    * data: 54,

    * coordinating_only: 0,

    * master: 3,

    * ingest: 62},

  * versions: [
    * "6.3.0"],

  * os: {
    * available_processors: 1144,

    * allocated_processors: 1144,

    * names: [
      * {
        * name: "Linux",

        * count: 62}],

    * mem: {
      * total_in_bytes: 5514092630016,

      * free_in_bytes: 609798819840,

      * used_in_bytes: 4904293810176,

      * free_percent: 11,

      * used_percent: 89}},

  * process: {
    * cpu: {
      * percent: 363},

    * open_file_descriptors: {
      * min: 1635,

      * max: 4105,

      * avg: 3102}},

  * jvm: {
    * max_uptime_in_millis: 11555411240,

    * versions: [
      * {
        * version: "1.8.0_181",

        * vm_name: "Java HotSpot(TM) 64-Bit Server VM",

        * vm_version: "25.181-b13",

        * vm_vendor: "Oracle Corporation",

        * count: 8},

      * {
        * version: "1.8.0_191",

        * vm_name: "Java HotSpot(TM) 64-Bit Server VM",

        * vm_version: "25.191-b12",

        * vm_vendor: "Oracle Corporation",

        * count: 54}],

    * mem: {
      * heap_used_in_bytes: 744203944496,

      * heap_max_in_bytes: 1928486518784},

    * threads: 11634},

  * fs: {
    * total_in_bytes: 345595116142592,

    * free_in_bytes: 230171333484544,

    * available_in_bytes: 230171333484544},

  * plugins: [ ],

  * network_types: {
    * transport_types: {
      * security4: 62},

    * http_types: {
      * security4: 62}}}

}

Nitish_Goyal · January 14, 2019, 4:37am

@elastic

system · February 11, 2019, 4:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Garbage collection Elasticsearch	13	8310	July 6, 2017
Help with GC configuration Elasticsearch	5	675	July 6, 2017
GC issue Elasticsearch	3	330	July 6, 2017
Very long GC Elasticsearch	11	6901	July 6, 2017
JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak? Elasticsearch	3	984	July 6, 2017

Elasticsearch nodes doing young generation gc very frequently

Related topics