Heapdump - out of memory java heap space

TaskDead · May 3, 2018, 9:46am

Greetings

We are running a 3 node cluster(version-1.4.4) (all masters) with 30G heap with JRE 1.7.80 and replication as 1.
Data: 2.5TB(live)
Total: 25TB
shard per index is 3
daily data is around 70-80GB
indexes are created everyday

We are having frequent GCs running into hours sometimes making the cluster unstable every week

output of jmap histo of one of the servers - https://pastebin.com/8gcQySx2

Can someone point me in the right direction

Any help is appreciated
Thanks in advance

dadoonet · May 3, 2018, 10:05am

How many shards in total do you have?

You definitely need to upgrade anyway.

TaskDead · May 3, 2018, 11:01am

No of primary shards is around 120
Sorry to say but upgrading is not a option in the current situation

TaskDead · May 4, 2018, 5:32am

Can some one explain the top 5-6 items using large chunk of memory mean in terms of elasticsearch given in the jmap -histo
Link - https://pastebin.com/8gcQySx2

Christian_Dahlqvist · May 4, 2018, 5:49am

Doc values were introduced early in the Elasticsearch 1.x series, but were initially quite slow. This improved over time and performance improved so much by version 1.7 that we could make them enabled by default in Elasticsearch 2.0.

If you have fields that you need to aggregate on but not perform free-text search on, you can save a lot of heap space by mapping these using doc_values.

What does your data and mappings look like? Are you using doc values?

TaskDead · May 4, 2018, 6:03am

No we are not using Doc values,

A daily index typically looks like:
https://pastebin.com/EJUFpxqK

There is only one type in daily indexes and we don't aggregate on analyzed fields
Data we capture are the logs generated from the devices like firewall, windows, linux etc

Christian_Dahlqvist · May 4, 2018, 6:05am

Not analysed fields are still kept on the heap unless you use doc values, so I suspect that is one thing that is driving your heap usage.

I recall heap pressure dropping significantly when users switched to doc_values, allowing much large data volumes to be stored per node. If I remember correctly though (this was a long time ago as this is a very old version), doc values requires the use of aggregations rather than facets, so Kibana3 may not work with doc values.

TaskDead · May 4, 2018, 6:17am

Currently on one of the nodes the heap is 65 percent and the fileddata is 1.4 gb and this remains true to all of the nodes.
We have even added scripts to clear cache once it reaches 80-85 percent

Christian_Dahlqvist · May 4, 2018, 6:24am

I would like David recommend upgrading. I do not really remember enough to have any other suggestions at this point.

dadoonet · May 4, 2018, 6:35am

If you don't want to upgrade another short term solution is to start new nodes.

TaskDead · May 4, 2018, 7:21am

Ok , I will keep that in mind.
Meanwhile is there anything we can do to make the occurrence of the heap issue less frequent other than adding nodes

dadoonet · May 4, 2018, 7:57am

Some ideas:

Don't run aggregations
Don't sort
Remove old indices

TaskDead · May 6, 2018, 5:42am

Ok, one last question do you recommend to enable doc values for ES 1.4.4 to resolve or to delay the heap issue,

Christian_Dahlqvist · May 6, 2018, 6:20am

How are you querying your data? Kibana3? Kibana4? Aggregations? Facets? Just searches?

TaskDead · May 6, 2018, 6:27am

aggregation and searches with sorts using python

TaskDead · May 6, 2018, 6:28am

Automated queries don't execute for more than 24 hrs of the old data, so maximum 2 indexes will be queried

Christian_Dahlqvist · May 6, 2018, 6:39am

If you are not using facets you should (as far as I can recall) be able to switch to using doc values for not analysed fields and save on heap space. If this causes performance problems, it could be worthwhile upgrading to version 1.7.6 as there were a number of performance improvements introduced throughout the series.

TaskDead · May 6, 2018, 6:45am

Ok thank you for the thumps up and we are not using facets, I will enable the doc values and keep monitoring. If it only takes more time to give back the result its not an issue for me.
Going by yours and David's suggestion we are planning it to upgrade to 2.x series

Christian_Dahlqvist · May 6, 2018, 6:47am

I would recommend testing it properly first though, as my memory stretching this far back is a bit hazy...

TaskDead · May 6, 2018, 6:58am

Ok, I will test it and only then will apply it on the production

Topic		Replies	Views
Heap Problem Elasticsearch	4	799	July 5, 2017
Elasticsearch Garbage Collection issue Elasticsearch	13	5082	July 5, 2017
Elasticsearch Heap size growing with time and lot of GC, eventually pulling the cluster down Elasticsearch	6	2641	July 5, 2017
Elasticsearch high heap usage Elasticsearch	5	426	July 6, 2017
Why does heap usage keep approaching 100%? Elasticsearch	5	1538	July 6, 2017

Heapdump - out of memory java heap space

Related topics