GC: collection_count questions


#1

We have been having some issues with our production cluster composed on 7 nodes. Right now the cluster is green but I see something that is making feel comfortable about its state.

"version" : {
"number" : "1.7.1",

java full version "1.8.0_51-b16"

I ran the node stats command and I noticed the "gc" collectors "old" are too high on all nodes.
According to what I read online " The old generation collection count should remain small, and have a small collection_time_in_millis"

   "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 14830,
          "collection_time_in_millis" : 993189
        },
        "old" : {
          "collection_count" : 2857,
          "collection_time_in_millis" : 1082151
        }
      }

--
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 14127,
"collection_time_in_millis" : 947026
},
"old" : {
"collection_count" : 2809,
"collection_time_in_millis" : 1814255
}
}

    "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 27328,
          "collection_time_in_millis" : 2547057
        },
        "old" : {
          "collection_count" : 2493,
          "collection_time_in_millis" : 284310
        }
      }

--
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 15054,
"collection_time_in_millis" : 894937
},
"old" : {
"collection_count" : 2829,
"collection_time_in_millis" : 511283
}
}

    "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 13068,
          "collection_time_in_millis" : 1075700
        },
        "old" : {
          "collection_count" : 2636,
          "collection_time_in_millis" : 378742
        }
      }

--
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 13100,
"collection_time_in_millis" : 1208937
},
"old" : {
"collection_count" : 2592,
"collection_time_in_millis" : 714342
}
}

    "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 22154,
          "collection_time_in_millis" : 1379513
        },
        "old" : {
          "collection_count" : 2701,
          "collection_time_in_millis" : 281411
        }
      }

The server uptime is as follow:
Node1: 1 day
Node2: 28 days
Node3: 28 days
Node4: 28 days
Node5: 28 days
Node6: 20 days
Node7 1 day

I also noticed the heap_used_percent is most of the time above 75%
"jvm" : {
"timestamp" : 1441893597402,
"uptime_in_millis" : 145190322,
"mem" : {
"heap_used_in_bytes" : 19953288096,
"heap_used_percent" : 84,
"heap_committed_in_bytes" : 23587454976,
"heap_max_in_bytes" : 23587454976,
"non_heap_used_in_bytes" : 129210576,
"non_heap_committed_in_bytes" : 132513792,
"pools" : {

  "jvm" : {
    "timestamp" : 1441893597241,
    "uptime_in_millis" : 145200995,
    "mem" : {
      "heap_used_in_bytes" : 20344159872,
      "heap_used_percent" : 86,
      "heap_committed_in_bytes" : 23587454976,
      "heap_max_in_bytes" : 23587454976,
      "non_heap_used_in_bytes" : 138011320,
      "non_heap_committed_in_bytes" : 140562432,
      "pools" : {

  "jvm" : {
    "timestamp" : 1441893596704,
    "uptime_in_millis" : 143036366,
    "mem" : {
      "heap_used_in_bytes" : 20152739200,
      "heap_used_percent" : 81,
      "heap_committed_in_bytes" : 24661196800,
      "heap_max_in_bytes" : 24661196800,
      "non_heap_used_in_bytes" : 125541880,
      "non_heap_committed_in_bytes" : 128040960,
      "pools" : {

This is my elasticsearch.yml setting
path.data: /var/data/elasticsearch
cluster.name: GM-RTD
node.master: true
node.name: ElasticSearch-1
http.cors.enabled: true
plugin.mandatory: cloud-aws
bootstrap.mlockall: true
cloud.aws.access_key: XXXXXX
cloud.aws.secret_key: XXXXXXX
discovery.type: ec2
discovery.zen.ping.multicast.enabled: false
discovery.ec2.groups: GM-VPC
discovery.zen.minimum_master_nodes: 1
gateway.recover_after_nodes: 1
gateway.recover_after_time: 5m
gateway.expected_nodes: 2

The heap size is set to:
ES_HEAP_SIZE=22g

sudo sysctl -a | grep vm.max_map_count
vm.max_map_count = 262144

process" : {
"refresh_interval_in_millis" : 1000,
"id" : 32383,
"max_file_descriptors" : 65535,
"mlockall" : true
}


(Luca Cavanna) #2

When the percent of heap used is between 70% and 80%, that is a good signal that it's time to either add memory or look into what's using memory. Might very well be that it will also explain the reason behind the many old collection counts.

Are you using aggregations, sorting or scripting? Those are the features that make heavy use of field_data cache, the main responsible for high heap usage. Also which version of elasticsearch are you on?


#3

Thanks for getting back to us, here are the info you requested:
version" : {
"number" : "1.7.1",
"build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
"build_timestamp" : "2015-07-29T09:54:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
Yes, our Analytic team does lot of aggreations and scripting.
These are the field data per node
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.03,
"tripped" : 0

    "fielddata" : {
      "memory_size_in_bytes" : 4739124,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 4739124,
"estimated_size" : "4.5mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 106943440,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 106943440,
"estimated_size" : "101.9mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 53842652,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 53842652,
"estimated_size" : "51.3mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14796718080,
"limit_size" : "13.7gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9864478720,
"limit_size" : "9.1gb",

    "fielddata" : {
      "memory_size_in_bytes" : 3992540,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 3992540,
"estimated_size" : "3.8mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 71535588,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

--
"fielddata" : {
"limit_size_in_bytes" : 14152472985,
"limit_size" : "13.1gb",
"estimated_size_in_bytes" : 71535588,
"estimated_size" : "68.2mb",
"overhead" : 1.03,
"tripped" : 0
},
"request" : {
"limit_size_in_bytes" : 9434981990,
"limit_size" : "8.7gb",

    "fielddata" : {
      "memory_size_in_bytes" : 52855724,
      "evictions" : 0
    },
    "percolate" : {
      "total" : 0,
      "time_in_millis" : 0,
      "current" : 0,
      "memory_size_in_bytes" : -1,
      "memory_size" : "-1b",
      "queries" : 0

(Mark Walkom) #4

Then as mentioned you probably need to scale to cope with your load.


(system) #5