Troubleshooting High heap usage

animageofmine · February 6, 2018, 4:03am

I have looked at many posts in the discussion forum and googled quite a bit as well. Either these posts are for older versions or there wasn't anything that matched the issue I am seeing. Hence, posting it one more time. Apologies if there is something really obvious that I am overlooking.

There are a 5 data nodes in our cluster and data seem to be distributed evenly, but only one or two data nodes consistently seem to have high heap usage (between 85-90%). Can someone help me out with troubleshooting steps? Few things I have looked at:

Cluster health / unassigned shards : 0
Number of pending tasks: 0
Script stats: look normal
Field data: between 10-15% of memory

Data Node configuration: 16 cores, 30 GB RAM, 15 GB for lucene and ES each
ES version: 5.3.2

I couldn't interpret anything meaningful from other stats. Please see node stats in my onedrive

We don't have X-Pack and installing one in production is out of scope as of now. Let me know if you need more info.

Christian_Dahlqvist · February 6, 2018, 6:55am

What is the full output of the cluster stats API?

animageofmine · February 6, 2018, 8:23am

I had to reboot the node. Please find cluster stats output here

We are working on reducing number of shards, but that does not seem to be an issue since other nodes are holding up fine (we have a cluster with over 25k shards and that is also doing alright).

loren · February 6, 2018, 6:37pm

Are you doing a lot of updates to existing documents? I see a lot of deleted docs reported in your cluster stats.

I ran into a similar issue recently where one or two data nodes would have high heap usage and the high CPU that goes along with constant garbage collections.

The culprit was lots of updates to existing documents.

animageofmine · February 8, 2018, 7:27am

@loren We update in batches (bulk). Our updates are essentially overwriting whole document (no partial updates). Since ES handles updates via delete followed by index, I guess it does end up deleting a lot of documents.

In your thread you seemed to be using X-Pack. Do you have some performance counters / metrics that you particular looked at? We don't use X-Pack, however have installed telegraf plugin.

loren · February 8, 2018, 5:56pm

In my case X-Pack was not of much help diagnosing the problem as it doesn't graph per-node merge rates. I used iostat on the busy node to see that the write rate was 4X more than other nodes and took a guess that the problem was due to frequent merges. I reduced the number of updates by a wide margin, and this solved my problem.

Good luck!

system · March 8, 2018, 5:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why does heap usage keep approaching 100%? Elasticsearch	5	1535	July 6, 2017
Why is my heap usage always high? Elasticsearch	10	5060	July 5, 2017
Elasticsearch heap issues Elasticsearch	4	472	July 5, 2017
Heap used always > 85% Elasticsearch	14	3904	July 5, 2017
Large heap usage with each node Elasticsearch	15	3756	July 5, 2017

Troubleshooting High heap usage

Related topics