ES2.0.2 - Heap near 100% and eventually Elasticsearch locks up

Hello! First post so apologies if i'm leaving something important out.

We are running into some problems weekly-ish where the heap utilization floats near 100% after a while, and eventually crashes Elaticsearch, requiring a kill -9 to end the process.

Cluster looks like this:

  • 6 servers, 128GB memory per
  • 3 nodes are dual master/data nodes
  • 1 node is dual web / data node
  • ES is allocated 31.something GB memory to get below the compressed Oops threshold
  • 6 shards per index (one per server) plus 1 replica
  • paging disabled
  • Indexing 1-10 TB of information per day via bulk queues
  • no plugins (I saw the post about a memory leak in the SSL plugin, not using that)
  • Nothing else is running on the machines.

What we're seeing is eventually the heap gradually floats up to near 100%, a point where the process stops responding, forcing a restart of the node. I understand that generally the heap will gradually increase in usage as it keeps a cache around, but I can't imagine that it would max itself out like this.

Before a node craps out, we see logs like this:
[WARN ][monitor.jvm ] [gc][old][80118][4448] duration [27.6s], collections [1]/[27.8s], total [27.6s]/[11.1m], memory [30.7gb]->[30.6gb]/[31.9gb], all_pools {[young] [146.2mb]->[153.6mb]/[153.6mb]}{[survivor] [50.8mb]->[847.3kb]/[51.1mb]}{[old] [30.5gb]->[30.4gb]/[31.7gb]}

GC Old-generation spends nearly 30 seconds to reclaim...100 megs. This is while no active activity on the cluster such as a read or index operation.

Heap usage captured in Grafana:

The graph above shows the standing heap usage for a while, then a few nodes come back online after being restarted at much lower heap, then I restart the rest of the nodes, heap usage drops, then raises a little as the cluster rebalances.

Really unsure how to proceed and fix - about to start attaching jvm tooling to inspect the heap. Any help is greatly appreciated.

First of all I would like to point out that you really should upgrade as that is a very old version.

That is not a very healthy heap pattern. It looks like you are beyond what the current cluster can handle. You may want to look into your mappings and make sure that you are using doc_values to as great extent as possible and avoid having fields you do not need to perform free-text search on analysed. You may also want to read this blog post and have a look at your sharding practices.

As a more short term fix it may make sense to scale out the cluster to get more headroom in the cluster. You have a good amount of RAM and should be able to run two data nodes per host, which will give you more total heap space.

Thanks for the response! An upgrade is in the works.

During normal indexing operations the cluster stays between 60-80% as expected, so we didn't think the cluster sizing was the issue, but it likes to creep up in usage over time, so maybe it is... That graph just shows the operation the past few hours where it was falling over.

I think we have doc_values on by default for all fields since disk space isn't a concern - our sharding strategy is "one shard per host", which I believe was suggested to us at Elasticon a few years back, but perhaps we should play with that a little.

How many shards do you have on average per node? What is the average shard size?

Our data is structured as an index per client, per day - some clients index a lot of data (200Gb), others only a few hundred Mb - so we had to implement a dynamic sharding calculation measure this last week to adjust for the disparate data size. Currently we are running 1 shard for small clients, up to 48 shards for the largest.

Current shard size for our largest client is around 3Gb - adding more shards to the indexes cause indexing errors that appeared to overload Elasticsearch, too few shards and we ran out of memory fairly quickly.

We could migrate some hardware from our staging environment to the production cluster to help out, but right now we are using it to test the much-needed upgrade.

On a side-node, looking at the heap, almost all of it is consumed by arrays of longs:

num     #instances         #bytes  class name
----------------------------------------------
1:       5050508    18500208000  [J
2:      12343057     5045737640  [B
3:       6844994     1701292208  [I
4:      12086275      870211800  org.apache.lucene.util.fst.FST$Arc
5:       5786172      462893760  com.google.common.cache.LocalCache$Segment

Do you happen to know if this is mostly cache?

Thanks!

3GB shards is not very large. What does the cluster stats API give you?

Not sure what to look for specifically - here is most of it!

   "indices": {
        "count": 306,
        "shards": {
            "total": 3046,
            "primaries": 1523,
            "replication": 1,
            "index": {
                "shards": {
                    "min": 2,
                    "max": 188,
                    "avg": 9.954248366013072
                },
                "primaries": {
                    "min": 1,
                    "max": 94,
                    "avg": 4.977124183006536
                },
                "replication": {
                    "min": 1,
                    "max": 1,
                    "avg": 1
                }
            }
        },
        "docs": {
            "count": 2636982833,
            "deleted": 91661404
        },
        "store": {
            "size_in_bytes": 3162366821984,
            "throttle_time_in_millis": 0
        },
        "fielddata": {
            "memory_size_in_bytes": 0,
            "evictions": 0
        },
        "query_cache": {
            "memory_size_in_bytes": 0,
            "total_count": 0,
            "hit_count": 0,
            "miss_count": 0,
            "cache_size": 0,
            "cache_count": 0,
            "evictions": 0
        },
        "completion": {
            "size_in_bytes": 0
        },
        "segments": {
            "count": 58070,
            "memory_in_bytes": 10465190800,
            "terms_memory_in_bytes": 9873110528,
            "stored_fields_memory_in_bytes": 584415032,
            "term_vectors_memory_in_bytes": 0,
            "norms_memory_in_bytes": 2322800,
            "doc_values_memory_in_bytes": 5342440,
            "index_writer_memory_in_bytes": 0,
            "index_writer_max_memory_in_bytes": 1559552000,
            "version_map_memory_in_bytes": 0,
            "fixed_bit_set_memory_in_bytes": 15538514968
        },
        "percolate": {
            "total": 0,
            "time_in_millis": 0,
            "current": 0,
            "memory_size_in_bytes": -1,
            "memory_size": "-1b",
            "queries": 0
        }
    },
    "nodes": {
        "count": {
            "total": 6,
            "master_only": 0,
            "data_only": 3,
            "master_data": 3,
            "client": 0
        },
        "versions": [
            "2.0.2"
        ],
        "os": {
            "available_processors": 240,
            "mem": {
                "total_in_bytes": 0
            },
            "names": [
                {
                    "name": "Linux",
                    "count": 1
                }
            ]
        },
        "process": {
            "cpu": {
                "percent": 50
            },
            "open_file_descriptors": {
                "min": 16836,
                "max": 17488,
                "avg": 17215
            }
        },
        "jvm": {
            "max_uptime_in_millis": 579544539,
            "versions": [
                {
                    "version": "1.8.0_92",
                    "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
                    "vm_version": "25.92-b14",
                    "vm_vendor": "Oracle Corporation",
                    "count": 6
                }
            ],
            "mem": {
                "heap_used_in_bytes": 147892264024,
                "heap_max_in_bytes": 205823803392
            },
            "threads": 1482
        },
        "fs": {
            "total_in_bytes": 561843758039040,
            "free_in_bytes": 558673320202240,
            "available_in_bytes": 530356196724736,
            "spins": "true"
        },

To update this - we have not upgraded to 5.6.x yet, but we did triple the shard size to around 10Gb each, and this has stabilized the cluster. We have also implemented closing old indexes sooner, as well as only adding replicas after the indexing completes for the bulk indexing jobs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.