Hi all,
We have a deployment in Elastic Cloud, and since yesterday my JVM is with a very high usage (about 85-90% of heap memory) and giving a few errors. Any help would be appreciated. I got this log through the console at .

[instance-0000000020] collector [cluster_stats] failed to collect data at ~[elasticsearch-7.10.1.jar:7.10.1] at$1.onFailure( ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure( ~[elasticsearch-7.10.1.jar:7.10.1] at ~[elasticsearch-7.10.1.jar:7.10.1] at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?] at java.util.concurrent.ThreadPoolExecutor$ [?:?] at [?:?] Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<reduce_aggs>] would be [2074589158/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2074588864/1.9gb], new bytes reserved: [294/294b], usages [request=881/881b, fielddata=12417733/11.8mb, in_flight_requests=0/0b, model_inference=0/0b, accounting=14812302/14.1mb] at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit( ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak( ~[elasticsearch-7.10.1.jar:7.10.1] at$PendingMerges.addEstimateAndMaybeBreak( ~[elasticsearch-7.10.1.jar:7.10.1] at ~[elasticsearch-7.10.1.jar:7.10.1] at ~[elasticsearch-7.10.1.jar:7.10.1] at$000( ~[elasticsearch-7.10.1.jar:7.10.1] at$1.doRun( ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun( ~[elasticsearch-7.10.1.jar:7.10.1] at ~[elasticsearch-7.10.1.jar:7.10.1] ... 3 more

I keep receiving this kind of message from the garbage collector:

[instance-0000000020] [gc][347732] overhead, spent [869ms] collecting in the last [1s]

What is the output from the _cluster/stats?pretty&human API?

Hi @warkolm,

I did some things in my cluster. First I disabled xpack monitoring through the API (was collecting data and shipping to this same deploy, not so good).
I did delete about 70 shards, increased the refresh interval to 30s for the remaining, and called force_merge in a bunch of others. Than rebooted and the cluster is working flawlessly for about 24hs.
Thanks so much for the help!