Track Queries with High Heap Usage

BenB196 · April 12, 2022, 7:00pm

Hi All,

I was wondering the best way to track queries based on their Heap usage.

Context: I've recently had to increase heap allocation (currently 20GB RAM - 18GB Heap, up from 10GB RAM - 8GB Heap) on coordinating nodes (3 nodes in the cluster) in a cluster, as they are frequently hitting circuit breakers. This cluster has become more actively used, so while I would have expected to increase the heap a little, I wouldn't have expected to increase it this much.

I suspect that potentially a few newer queries are using far more heap than I would expect, but I haven't been able to find a reliable way to track queries based on heap usage.

I looked at Slow Log, but I don't think this is the correct approach, as the queries being executed aren't slow, they just (potentially) use a good amount of heap.

The queries are all from Kibana features (Observability rules, SIEM rules), so I don't have too much control/ability to debug the actual queries being executed.

Setup:
Elasticsearch Version: 7.17.2
Kibana Version: 7.17.2
Install Method: Kubernetes/Containers/ECK

DineshNaik · April 12, 2022, 7:46pm

Usually queries with lots of aggregation and sort tend to cause higher cpu and memory usage.

What kind of queries you have in your application?

Have you seen out of memory issues in any node . One way to find such queries would be to analyse the heapdumps generated on OOM scenarios.

BenB196 · April 12, 2022, 7:53pm

Hi @DineshNaik, the queries all come from Kibana "Rules", the main rules that we use are the Metrics rule under Observability, and a few Log Threshold rules also under observability. We also have SIEM rules (mainly the prebuilt ones), but these rules haven't really changed much since the noticeable uptick in Heap usage, so it leads me to believe that some of the Metric or Log Threshold rules have/cause an issue.

Regarding OOM kills, so far, I have not noticed any OOM kills, but I have seen the coordinating nodes lock up for several minutes, near the point of getting OOM killed as the Heap takes a while to free up.

DineshNaik · April 13, 2022, 2:23am

What about date ranges , has the data grown drastically and you are looking for all of it ?

BenB196 · April 13, 2022, 12:28pm

Regarding date ranges, most of the queries only look back the last ~5 minutes.

The data has grown a bit, but not what I'd consider drastic. (Context: before the issue, cluster handled ~50k e/s, cluster now handles 65k e/s, so I wouldn't expect a doubling of heap usage requirement)

DineshNaik · April 14, 2022, 3:24am

What are your compute configurations infra wise ?
In slow logs have you checked if some queries are taking more time than usual?

system · May 12, 2022, 3:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to investigate heap usage spikes? Elasticsearch	3	1413	July 13, 2019
Heap usage of client node is continuously increasing as I run queries Elasticsearch	5	1084	March 31, 2017
Understanding HEAP usage Elasticsearch	3	737	July 6, 2017
Need help with heap analysis Elasticsearch	2	510	July 5, 2017
How to correlate costly queries with intense garbage collection leading to out of memory Elasticsearch	5	555	November 7, 2018

Track Queries with High Heap Usage

Related topics