org.elasticsearch.action.search.SearchPhaseExecutionException - collector failed to collect data

Hi all,
We have a deployment in Elastic Cloud, and since yesterday my JVM is with a very high usage (about 85-90% of heap memory) and giving a few errors. Any help would be appreciated. I got this log through the console at cloud.elastic.co .

[instance-0000000020] collector [cluster_stats] failed to collect data org.elasticsearch.action.search.SearchPhaseExecutionException: at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:568) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:100) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:725) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) ~[elasticsearch-7.10.1.jar:7.10.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?] at java.lang.Thread.run(Thread.java:832) [?:?] Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<reduce_aggs>] would be [2074589158/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2074588864/1.9gb], new bytes reserved: [294/294b], usages [request=881/881b, fielddata=12417733/11.8mb, in_flight_requests=0/0b, model_inference=0/0b, accounting=14812302/14.1mb] at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:346) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:109) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.QueryPhaseResultConsumer$PendingMerges.addEstimateAndMaybeBreak(QueryPhaseResultConsumer.java:279) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:109) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:47) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.1.jar:7.10.1] ... 3 more

I keep receiving this kind of message from the garbage collector:

[instance-0000000020] [gc][347732] overhead, spent [869ms] collecting in the last [1s]

What is the output from the _cluster/stats?pretty&human API?

Hi @warkolm,

I did some things in my cluster. First I disabled xpack monitoring through the API (was collecting data and shipping to this same deploy, not so good).
I did delete about 70 shards, increased the refresh interval to 30s for the remaining, and called force_merge in a bunch of others. Than rebooted and the cluster is working flawlessly for about 24hs.
Thanks so much for the help!