GC: collection_count questions

genesis · September 10, 2015, 2:55pm

We have been having some issues with our production cluster composed on 7 nodes. Right now the cluster is green but I see something that is making feel comfortable about its state.

"version" : {
"number" : "1.7.1",

java full version "1.8.0_51-b16"

I ran the node stats command and I noticed the "gc" collectors "old" are too high on all nodes.
According to what I read online " The old generation collection count should remain small, and have a small collection_time_in_millis"

   "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 14830,
          "collection_time_in_millis" : 993189
        },
        "old" : {
          "collection_count" : 2857,
          "collection_time_in_millis" : 1082151
        }
      }

--
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 14127,
"collection_time_in_millis" : 947026
},
"old" : {
"collection_count" : 2809,
"collection_time_in_millis" : 1814255
}
}

    "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 27328,
          "collection_time_in_millis" : 2547057
        },
        "old" : {
          "collection_count" : 2493,
          "collection_time_in_millis" : 284310
        }
      }

--
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 15054,
"collection_time_in_millis" : 894937
},
"old" : {
"collection_count" : 2829,
"collection_time_in_millis" : 511283
}
}

    "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 13068,
          "collection_time_in_millis" : 1075700
        },
        "old" : {
          "collection_count" : 2636,
          "collection_time_in_millis" : 378742
        }
      }

--
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 13100,
"collection_time_in_millis" : 1208937
},
"old" : {
"collection_count" : 2592,
"collection_time_in_millis" : 714342
}
}

    "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 22154,
          "collection_time_in_millis" : 1379513
        },
        "old" : {
          "collection_count" : 2701,
          "collection_time_in_millis" : 281411
        }
      }

The server uptime is as follow:
Node1: 1 day
Node2: 28 days
Node3: 28 days
Node4: 28 days
Node5: 28 days
Node6: 20 days
Node7 1 day

I also noticed the heap_used_percent is most of the time above 75%
"jvm" : {
"timestamp" : 1441893597402,
"uptime_in_millis" : 145190322,
"mem" : {
"heap_used_in_bytes" : 19953288096,
"heap_used_percent" : 84,
"heap_committed_in_bytes" : 23587454976,
"heap_max_in_bytes" : 23587454976,
"non_heap_used_in_bytes" : 129210576,
"non_heap_committed_in_bytes" : 132513792,
"pools" : {

  "jvm" : {
    "timestamp" : 1441893597241,
    "uptime_in_millis" : 145200995,
    "mem" : {
      "heap_used_in_bytes" : 20344159872,
      "heap_used_percent" : 86,
      "heap_committed_in_bytes" : 23587454976,
      "heap_max_in_bytes" : 23587454976,
      "non_heap_used_in_bytes" : 138011320,
      "non_heap_committed_in_bytes" : 140562432,
      "pools" : {

  "jvm" : {
    "timestamp" : 1441893596704,
    "uptime_in_millis" : 143036366,
    "mem" : {
      "heap_used_in_bytes" : 20152739200,
      "heap_used_percent" : 81,
      "heap_committed_in_bytes" : 24661196800,
      "heap_max_in_bytes" : 24661196800,
      "non_heap_used_in_bytes" : 125541880,
      "non_heap_committed_in_bytes" : 128040960,
      "pools" : {

This is my elasticsearch.yml setting
path.data: /var/data/elasticsearch
cluster.name: GM-RTD
node.master: true
node.name: ElasticSearch-1
http.cors.enabled: true
plugin.mandatory: cloud-aws
bootstrap.mlockall: true
cloud.aws.access_key: XXXXXX
cloud.aws.secret_key: XXXXXXX
discovery.type: ec2
discovery.zen.ping.multicast.enabled: false
discovery.ec2.groups: GM-VPC
discovery.zen.minimum_master_nodes: 1
gateway.recover_after_nodes: 1
gateway.recover_after_time: 5m
gateway.expected_nodes: 2

The heap size is set to:
ES_HEAP_SIZE=22g

sudo sysctl -a | grep vm.max_map_count
vm.max_map_count = 262144

process" : {
"refresh_interval_in_millis" : 1000,
"id" : 32383,
"max_file_descriptors" : 65535,
"mlockall" : true
}

javanna · September 10, 2015, 3:35pm

When the percent of heap used is between 70% and 80%, that is a good signal that it's time to either add memory or look into what's using memory. Might very well be that it will also explain the reason behind the many old collection counts.

Are you using aggregations, sorting or scripting? Those are the features that make heavy use of field_data cache, the main responsible for high heap usage. Also which version of elasticsearch are you on?

genesis · September 10, 2015, 6:59pm

Thanks for getting back to us, here are the info you requested:
version" : {
"number" : "1.7.1",
"build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
"build_timestamp" : "2015-07-29T09:54:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
Yes, our Analytic team does lot of aggreations and scripting.
These are the field data per node
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.03,
"tripped" : 0

    "fielddata" : {
      "memory_size_in_bytes" : 4739124,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 4739124,
"estimated_size" : "4.5mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 106943440,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 106943440,
"estimated_size" : "101.9mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 53842652,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 53842652,
"estimated_size" : "51.3mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14796718080,
"limit_size" : "13.7gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9864478720,
"limit_size" : "9.1gb",

    "fielddata" : {
      "memory_size_in_bytes" : 3992540,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 3992540,
"estimated_size" : "3.8mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 71535588,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 71535588,
"estimated_size" : "68.2mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 52855724,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

warkolm · September 11, 2015, 5:35am

Then as mentioned you probably need to scale to cope with your load.

Topic		Replies	Views
Elasticsearch fills the heap then spends all its time doing garbage collection Elasticsearch	5	5366	July 6, 2017
Elasticsearch taking a long time for garbage collection Elasticsearch	6	2471	July 6, 2017
Garbage collection Elasticsearch	13	8309	July 6, 2017
Growing old-gen size Elasticsearch	4	839	July 6, 2017
Elasticsearch heavy garbage collection Elasticsearch	2	572	July 6, 2017

GC: collection_count questions

-- "gc" : { "collectors" : { "young" : { "collection_count" : 14127, "collection_time_in_millis" : 947026 }, "old" : { "collection_count" : 2809, "collection_time_in_millis" : 1814255 } }

-- "gc" : { "collectors" : { "young" : { "collection_count" : 15054, "collection_time_in_millis" : 894937 }, "old" : { "collection_count" : 2829, "collection_time_in_millis" : 511283 } }

-- "gc" : { "collectors" : { "young" : { "collection_count" : 13100, "collection_time_in_millis" : 1208937 }, "old" : { "collection_count" : 2592, "collection_time_in_millis" : 714342 } }

-- "fielddata" : { "limit_size_in_bytes" : 14152472985, "limit_size" : "13.1gb", "estimated_size_in_bytes" : 4739124, "estimated_size" : "4.5mb", "overhead" : 1.03, "tripped" : 0 }, "request" : { "limit_size_in_bytes" : 9434981990, "limit_size" : "8.7gb",

-- "fielddata" : { "limit_size_in_bytes" : 14152472985, "limit_size" : "13.1gb", "estimated_size_in_bytes" : 106943440, "estimated_size" : "101.9mb", "overhead" : 1.03, "tripped" : 0 }, "request" : { "limit_size_in_bytes" : 9434981990, "limit_size" : "8.7gb",

-- "fielddata" : { "limit_size_in_bytes" : 14152472985, "limit_size" : "13.1gb", "estimated_size_in_bytes" : 53842652, "estimated_size" : "51.3mb", "overhead" : 1.03, "tripped" : 0 }, "request" : { "limit_size_in_bytes" : 9434981990, "limit_size" : "8.7gb",

-- "fielddata" : { "limit_size_in_bytes" : 14796718080, "limit_size" : "13.7gb", "estimated_size_in_bytes" : 0, "estimated_size" : "0b", "overhead" : 1.03, "tripped" : 0 }, "request" : { "limit_size_in_bytes" : 9864478720, "limit_size" : "9.1gb",

-- "fielddata" : { "limit_size_in_bytes" : 14152472985, "limit_size" : "13.1gb", "estimated_size_in_bytes" : 3992540, "estimated_size" : "3.8mb", "overhead" : 1.03, "tripped" : 0 }, "request" : { "limit_size_in_bytes" : 9434981990, "limit_size" : "8.7gb",

-- "fielddata" : { "limit_size_in_bytes" : 14152472985, "limit_size" : "13.1gb", "estimated_size_in_bytes" : 71535588, "estimated_size" : "68.2mb", "overhead" : 1.03, "tripped" : 0 }, "request" : { "limit_size_in_bytes" : 9434981990, "limit_size" : "8.7gb",

Related topics