ES 7.4 GC keeps reclaiming less memory on each pass

tyler-3 · April 23, 2020, 8:58pm

Hello, I recently upgraded 7.4 and I'm having a problem where the sawtooth pattern's reclaimed heap keeps shrinking (EG: Initially GC goes from 75% to 20%... but the node goes to 30%... then 40% with each additional reclamation).

Is there a query that will show how much heap each index is using?
How would I go about debugging this?

warkolm · April 23, 2020, 11:48pm

Are you using the monitoring functionality on your cluster?

tyler-3 · April 24, 2020, 12:54am

Not the standard monitoring tools. I'm using AWS's managed Elasticsearch but I'm preparing to move off of it over the next few days. Being unable to get a heap dump has really stymied my debugging.

warkolm · April 24, 2020, 1:01am

Ahh yeah, they do have somewhat limited tools in that area unfortunately.

You can try out elastic.co/cloud as it includes some good things there. It'll show things like query and indexing rates, resource usage and more. You won't get a heapdump though, as it's *aaS, so no host access.

However some other questions that might help;

what size heap?
how many nodes, shards, indices?

tyler-3 · April 24, 2020, 1:19am

Generally 5 nodes (but I've tried 3 as well), 10 shards per index, 18 indexes.

I've tried from 5x8GB nodes (assume heap is 40-50%) to 5x32GB nodes. The larger nodes are lasting longer simply due to Heap size I suspect. However, I have to keep triggering Amazon rollovers every day or so before the cluster crashes (Even with twice the machine power as before the upgrade).

The strange detail is that the total index size is not large (~25 GB) and generally CPU usage remains fairly low (Max around 35%). It's a real mystery.

warkolm · April 24, 2020, 1:36am

What sort of queries are you running, against what sort of data structure(s)?

tyler-3 · April 24, 2020, 1:54am

The queries are all over the map at around 2,500 per minute. A lot of aggregations and filter operations (There are 471 different elasticsearch calls). Indexing averages a couple hundred per minute.

The documents are generally fairly small (although there are millions of them) and have a couple dozen fields each on average (With the exception of one index that has almost a hundred).

I've looked at copious amounts of memory output from various elasticsearch stats endpoints and everything looks fine besides the heap used number. Cached queries is only in the megabytes. Fielddata is almost 0.

It would be nice if elasticsearch had a "data associated with this index uses 5 GB of heap" endpoint but I haven't found one. I am hopeful that there is one index that is causing trouble that will help isolate the issue.

tyler-3 · April 24, 2020, 9:21pm

I'm spinning up my own cluster. Will report back on results.

tyler-3 · April 26, 2020, 10:07pm

I've been running my own cluster the entire day side by side with the Amazon cluster forwarding traffic to both. My cluster is running perfectly and the Amazon cluster can't stay up for more than 5 hours. My cluster also literally costs more than 75% less.

I bet Amazon messed up their jvm options.

system · May 24, 2020, 10:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to optimise heap usage on elasticsearch nodes? Elasticsearch	9	711	November 10, 2020
GC can't decrease heap memory usage and Elastic fails #44312 Elasticsearch	12	590	August 12, 2019
Production es help Elasticsearch	9	739	December 14, 2016
Heap consumption Elasticsearch	4	1222	July 5, 2017
Elasticsearch 7.3 Heap Usage Elasticsearch	4	475	September 16, 2019

ES 7.4 GC keeps reclaiming less memory on each pass

Related topics