Probable memory leak: Heap utilization stuck at ~max heap for idle cluster

Cluster setup : Version : 7.1.1
3 master node
3 data node (16core, 64Mb Ram, Xms=Xmx=12Gb each)
2 coordinating node
650 indexes with a shard each.
Total data=~2Tb

Query:
All my data nodes are stuck at ~10Gb heap usage. There is no searching or indexing being done.
Its unclear to me why data nodes are occupying ~10Gb heap in an idle state ?

  1. Heap dump says Class "B" occupies 7.77Gb space. Heap dump screenshot:

  2. Even accounting circuit breaker gives an higher estimate, but that is next question. Node stats:
    https://del.dog/yaqamoteyu.json

  3. Even thread dump say nothing. None of the thread seem to be doing any kind of work.

Shreyash,

First question: how have you configured replication for this cluster? That has a large effect on how much data is being stored per node.

Perhaps you have already read the blog post "How many shards should I have in my Elasticsearch cluster?", but if not it is probably worth a look.

Each shard has data that need to be kept in memory and use heap space. This includes data structures holding information at the shard level, but also at the segment level in order to define where data reside on disk. The size of these data structures is not fixed and will vary depending on the use-case. […] The more heap space a node has, the more data and shards it can handle.

Indices and shards are therefore not free from a cluster perspective, as there is some level of resource overhead for each index and shard.

In other words, you should expect that shards will use heap space even when they are not actively being queried or written to. Note that your heap dump shows many Elasticsearch "CacheSegment" objects. These may simply be what is required by your indices' mappings.

The reason I asked about replication is that it looks like your shard sizes are within the guidelines recommended by the "How many shards?" blog post, but if you have enabled replication, you may have more shards per node than recommended. It's very hard to say in the abstract, since so much depends on the data and mappings.

You may be interested in Index Lifecycle Management (ILM), a feature that was released after that blog post was written. If you have "old" indices that do not need to be queried frequently, you can let the cluster "freeze" them, which reduces the amount of heap space they use. Some links:

Does this mean that your cluster's heap usage is expected behavior? In truth, I do not know. It would help to know a little bit more about your index replication policy and your use case. Do all or most of the 650 indices have the same mappings, like you would see if you were indexing logs and breaking up your data by time?

I hope some of this is helpful to you.

-William

Firstly, thanks for the good details. Appreciate that.

Yes.

Yes the replication is configured to a factor of 2.

My bad, specifically, 216 indices and 2 replicas. Each data node has 216 shards. In all, 648 shards. This goes good with what is mentioned in the blog per node.

Yes, all my indices have same mapping. My indices are not time-based. Once indexed they are not modified, just searched.

I am exploring this option.

Question:
The allocated shards are 216 per node is well within the proposed range as per blog. It is still not clear as to why each data node is occupying ~10Gb.
The cache segment you can see, per node, is just 25Mb.
Is there a way to know what class B is ?

I had output of /_segments, what i did was summed up all the "memory_in_bytes" from all nodes. This calculated to 23Gb. Is this field same as the memory used up in Heap by the segment ?

Have a look at this webinar which talks about optimizing for storage. This documentation is also a useful resource.

Shreyash,

Sorry for the delay in responding here.

I suspect that the class B[] in the heap dump is a byte array. J[] is an array of longs, and S[] is an array of strings. (Those codes are described in the javadocs for java.lang.Class#getName.)

I see this in the docs:

Segments need to store some data into memory in order to be searchable efficiently. This number returns the number of bytes that are used for that purpose.

…so I believe that the answer to your second question is yes, the memory_in_bytes value shows how much heap space the segment is using.

-William

Hi William,

Same problem for me ,
Total heap is consumed is 10 GB out of which 8 GB is consumed by Segment(terms bytes)
But I am not sure where is 2 GB is going on , Heap dump shows , as you mentioned class long and short is consuming about 800 MB and 200 MB resp.

Can you please help me to understand what else such information (long,short) ES holding in heap.