Understanding HEAP Usage with heavy bulk indexing

We have a reasonably large ES cluster and we're struggling to understand our Java HEAP usage. Over the last few weeks, nearly every single day during our heaviest bulk indexing period we are seeing JVM garbage collection loops when the ES cluster starts receiving some Kibana searches.

Cluster

At a high level, our cluster looks like this:

  • 3 x Master Nodes: c3.large instances
  • 8 x Data Nodes: 3.8xlarge w/ 5TB Provisioned IOPS EBS (4000 IOPS per node)
  • 8 x Client Nodes: Flume Indexing nodes that dump data to the local ES client
  • 2 x Client Nodes: Kibana Log Searching nodes

Data? We have that ... 15TB of storage over 30 days of data. Roughly ~300-400GB/day with about 300 million events stored per day.

Fields? We have those too. ~300-400 unique fields are added to the schema per day. I know, its more than ideal .. but they are extremely valuable to our engineers, so we're not going to be changing that any time soon.

Finally, we have metrics. We're not using Marvel, but we are collecting all the metrics via Collectd and graphing them.

Config File

bootstrap: 
  mlockall: true
cloud: 
  aws: 
    access_key: XXX
    region: us-east-1
    secret_key: XXX
  node: 
    auto_attributes: true
cluster: 
  name: flume-elasticsearch-production_vpc-useast1
  routing: 
    allocation: 
      allow_rebalance: indices_all_active
      cluster_concurrent_rebalance: 2
      node_concurrent_recoveries: 8
      same_shard: 
        host: true
discovery: 
  ec2: 
    any_group: false
    groups: 
          - default
          - OPS3-FLUME-ES
    host_type: private_ip
  type: ec2
  zen: 
    minimum_master_nodes: 2
    ping: 
      multicast: 
        enabled: false
      unicast: 
        enabled: false
gateway: 
  recover_after_time: 5m
index: 
  number_of_replicas: 1
  number_of_shards: 10
indices: 
  breaker: 
    fielddata: 
      limit: 85%
  fielddata: 
    cache: 
      size: 25%
  recovery: 
    max_bytes_per_sec: -1
  store: 
    throttle: 
      max_bytes_per_sec: -1
  translog: 
    flush_threshold_size: 1024mb
node: 
  data: false
  master: true
  name: prod-flume-es-master-useast1-114-i-c199bd6d-flume-elasticsearch-production_vpc-useast1
path: 
  data: /mnt/elasticsearch/flume-elasticsearch-production_vpc-useast1

Graphs?

You can see here that we hit our JVM heap max a few times -- yet the fielddata cache size is definitely not the main culprit.

Help

What I'm looking for here is help figuring out where the rest of the HEAP is being used.. What metric are we looking for that will help us figure out the missing memory here?

There's a bunch of stats that might be useful here, Marvel gives you better insight and correlation between them.

But are you monitoring threadpools?

@warkolm - Hi MArk, do we have a blog-post/wiki that talks about all things that consume memory like luene memory, index buffer, filter cache, query cache, etc. and how to tweak/monitor?

@warkolm, we are monitoring all of the thread_pool metrics -- but its not clear to me which pools may be using more memory than others, or if there is even a direct correlation.

@mosiddi, thats exactly what I'm looking for.. I want to know exactly what things use HEAP memory in ES, and how to monitor all of those individually.

1 Like

Anyone have any ideas here? We're still trying to understand the HEAP usage better...

Hi,

You could get a heap dump and look at it - that would be the most accurate way to see what's in your ES JVM heap.

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Hm, well it's hard to say since you are both indexing and querying via Kibana. So the heap usage could really be any component of ES. Is it possible to isolate a period of time that is index-only or search-only and watch the heap usage? I'm assuming that isn't possible, but I thought I'd ask =)

Indexing heap usage tends to look like a gentle sawtooth, since the indexing process generates a certain amount of garbage which is then periodically collected quickly. If you squint, your graph sorta looks like that, plus additional heap usage from queries.

Index time heap users that I can think of: general garbage, initial bulk, lucene indexing buffer, small amount of mem while segments are being merged

Search time could be a whole variety of causes, in no particular order: field data, transient memory used by bucket aggregations, filter cache, parent/child id_cache, very large terms filters, terms lookup cache, query cache

To be perfectly honest, in situations like this where you aren't sure what the culprit is...it's probably easiest to just graph all the memory related stats that Node Stats spits out and see which one is eating up your heap.

Some questions:

  • What version of ES?
  • Are the GCs you're seeing also slow GCs (e.g. in your logs)?
  • Are you using Kibana 3 or 4?
  • Are you controlling bulk index size? Bulks have to sit in memory on the coordinating node while they are processed, so if you fire off several 1gb bulks...that's a couple of gigs that are temporarily sitting on the heap until they are broken up and dispatched to other nodes.
  • Do you see this pattern on all nodes? Are you round-robin'ing between nodes equally?
  • You mentioned 300-400 new fields "per day"...does that mean each day has 300-400 random fields, or that today has 300, tomorrow has 600, etc? How many indices are you creating per day?
  • Are you heavy user of aliases?
  • Have you changed the JVM's CMSInitiatingOccupancyFraction setting from the default 75%?

Addendum: I talked with Mike McCandless and there is an issue I thought was fixed, but is actually still present until Lucene 5.0 (e.g. Elasticsearch 2.0).

When Lucene 4.x performs a merge of segments during indexing, it loads the norms data for all fields into memory during the merge. If you have a lot of fields with norms (analyzed string fields in ES by default), this can eat up a very sizeable amount of memory during merging. Since you have a large number of fields, that could very well be your root cause.

In ES 1.x, you'll see norms memory usage transiently in the Node Stats segments.memory_in_bytes field. It also includes other Lucene data structures, but should be obviously too large if norms are the root cause.

In ES 2.0 (e.g. Lucene 5.x), norms are loaded and merged one field at a time, so this is no longer a problem. There will also be more granular memory reporting in ES.

1 Like