org.elasticsearch.action.search.SearchPhaseExecutionException - collector failed to collect data

Hi all,
We have a deployment in Elastic Cloud, and since yesterday my JVM is with a very high usage (about 85-90% of heap memory) and giving a few errors. Any help would be appreciated. I got this log through the console at cloud.elastic.co .

[instance-0000000020] collector [cluster_stats] failed to collect data org.elasticsearch.action.search.SearchPhaseExecutionException: at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:568) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:100) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:725) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) ~[elasticsearch-7.10.1.jar:7.10.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?] at java.lang.Thread.run(Thread.java:832) [?:?] Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<reduce_aggs>] would be [2074589158/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2074588864/1.9gb], new bytes reserved: [294/294b], usages [request=881/881b, fielddata=12417733/11.8mb, in_flight_requests=0/0b, model_inference=0/0b, accounting=14812302/14.1mb] at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:346) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:109) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.QueryPhaseResultConsumer$PendingMerges.addEstimateAndMaybeBreak(QueryPhaseResultConsumer.java:279) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:109) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:47) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.1.jar:7.10.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.1.jar:7.10.1] ... 3 more

I keep receiving this kind of message from the garbage collector:

[instance-0000000020] [gc][347732] overhead, spent [869ms] collecting in the last [1s]

What is the output from the _cluster/stats?pretty&human API?

Hi @warkolm,

I did some things in my cluster. First I disabled xpack monitoring through the API (was collecting data and shipping to this same deploy, not so good).
I did delete about 70 shards, increased the refresh interval to 30s for the remaining, and called force_merge in a bunch of others. Than rebooted and the cluster is working flawlessly for about 24hs.
Thanks so much for the help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.