Heapdump - out of memory java heap space


We are running a 3 node cluster(version-1.4.4) (all masters) with 30G heap with JRE 1.7.80 and replication as 1.
Data: 2.5TB(live)
Total: 25TB
shard per index is 3
daily data is around 70-80GB
indexes are created everyday

We are having frequent GCs running into hours sometimes making the cluster unstable every week

output of jmap histo of one of the servers - https://pastebin.com/8gcQySx2

Can someone point me in the right direction

Any help is appreciated
Thanks in advance

How many shards in total do you have?

You definitely need to upgrade anyway.

No of primary shards is around 120
Sorry to say but upgrading is not a option in the current situation

Can some one explain the top 5-6 items using large chunk of memory mean in terms of elasticsearch given in the jmap -histo
Link - https://pastebin.com/8gcQySx2

Doc values were introduced early in the Elasticsearch 1.x series, but were initially quite slow. This improved over time and performance improved so much by version 1.7 that we could make them enabled by default in Elasticsearch 2.0.

If you have fields that you need to aggregate on but not perform free-text search on, you can save a lot of heap space by mapping these using doc_values.

What does your data and mappings look like? Are you using doc values?

No we are not using Doc values,

A daily index typically looks like:

There is only one type in daily indexes and we don't aggregate on analyzed fields
Data we capture are the logs generated from the devices like firewall, windows, linux etc

Not analysed fields are still kept on the heap unless you use doc values, so I suspect that is one thing that is driving your heap usage.

I recall heap pressure dropping significantly when users switched to doc_values, allowing much large data volumes to be stored per node. If I remember correctly though (this was a long time ago as this is a very old version), doc values requires the use of aggregations rather than facets, so Kibana3 may not work with doc values.

Currently on one of the nodes the heap is 65 percent and the fileddata is 1.4 gb and this remains true to all of the nodes.
We have even added scripts to clear cache once it reaches 80-85 percent

I would like David recommend upgrading. I do not really remember enough to have any other suggestions at this point.

1 Like

If you don't want to upgrade another short term solution is to start new nodes.

1 Like

Ok , I will keep that in mind.
Meanwhile is there anything we can do to make the occurrence of the heap issue less frequent other than adding nodes

Some ideas:

  • Don't run aggregations
  • Don't sort
  • Remove old indices

Ok, one last question do you recommend to enable doc values for ES 1.4.4 to resolve or to delay the heap issue,

How are you querying your data? Kibana3? Kibana4? Aggregations? Facets? Just searches?

aggregation and searches with sorts using python

Automated queries don't execute for more than 24 hrs of the old data, so maximum 2 indexes will be queried

If you are not using facets you should (as far as I can recall) be able to switch to using doc values for not analysed fields and save on heap space. If this causes performance problems, it could be worthwhile upgrading to version 1.7.6 as there were a number of performance improvements introduced throughout the series.

1 Like

Ok thank you for the thumps up and we are not using facets, I will enable the doc values and keep monitoring. If it only takes more time to give back the result its not an issue for me.
Going by yours and David's suggestion we are planning it to upgrade to 2.x series

I would recommend testing it properly first though, as my memory stretching this far back is a bit hazy...

Ok, I will test it and only then will apply it on the production