Heap usage of client node is continuously increasing as I run queries

ES 5.2.2, Ubuntu 16.04, Oracle JRE8
I have a 6x m4.large nodes, 120 M docs split into nearly 70 indices with 5 shards, 1 replica.
Querying is only done on latest data about 1 M docs using time filters.

This is a test cluster with no real traffic.
request_cache is only about 5-10 MB when the heap usage is at around 3 GB.

When I simulate a couple of queries https://gist.github.com/vanga/2cd8e1fd7c3b2bffa89fda8ce3a8a481, I see the increase in heap usage till the point old GC runs , heap comes back to normal after this, but with real traffic, having frequent old GCs is really making the cluster unusable.

This increase is only on the node I am making the request to. I have also force merged segments to max_num_segments=2.
Field data is not significant.

Any leads on how to debug this further would be helpful.

Thanks

Do you have Monitoring installed? If not, starting there would be a good idea as you can then see what's going on with a bit more detail.

I have monitoring installed, but there aren't lot of metrics.
Only heap usage metric is relevant here I think.

I checked the other metrics from node_stats API

I will just collate the observations at one place
Heap is increasing,
Old GCs are happening frequently
Request cache, query cache and fielddata are all in terms of few MBs
Segment count to max_num_segments=2, data is constant
Heap usage is only increasing on the node that I am making request.

Please let me know if you have any specific metrics in mind, I can take a look at them.

Thanks.

Here is the node status after running the two queries in the gist every 2 seconds.
Note, Though the data is constant, the queries may not be exactly same in each run. order of terms list might be changing in each run as the queries are auto generated.

I have also updated the gist with the output of node stats.

Thanks.

I am not an expert with JVM/GCs, but here are my observations.

I notice that as soon as I run a query, Old memory pool size is increasing.
Isn't it strange that objects get moved to the old pool so fast?
is there anything in ES that forces the objects to be moved to old GC straight away (I believe it's JVM that handles this memory management and ES has no control?)

What I did:
I restarted my node, noted different memory pools sizes, ran the query and saw that old pool size has increased. There is no other traffic other than the query I made.

Is something like increasing the young pool size a solution?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.