Huge aggregation triggers CircuitBreaker in loop

elephant · March 14, 2022, 3:38pm

Hi,

I was trying to make aggregations on a very huge index. The search triggered the circuit breaker which was expected given the size of the index (billion of documents):

elastic1_1          | "Caused by: org.elasticsearch.transport.RemoteTransportException: [data03][172.142.0.2:9330][indices:data/write/bulk]",
elastic1_1          | "Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/write/bulk] would be [31664846682/29.4gb], which is larger than the limit of [31621696716/29.4gb], real usage: [31664842872/29.4gb], new bytes reserved: [3810/3.7kb], usages [request=0/0b, fielddata=27198847678/25.3gb, in_flight_requests=3810/3.7kb, model_inference=0/0b, eql_sequence=0/0b, accounting=109554198/104.4mb]",
elastic1_1          | "at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:460) ~[elasticsearch-7.16.2.jar:7.16.2]",
elastic1_1          | "at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:108) ~[elasticsearch-7.16.2.jar:7.16.2]",
elastic1_1          | "at org.elasticsearch.transport.InboundAggregator.checkBreaker(InboundAggregator.java:213) ~[elasticsearch-7.16.2.jar:7.16.2]",
elastic1_1          | "at org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:117) ~[elasticsearch-7.16.2.jar:7.16.2]",
elastic1_1          | "at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:145) ~[elasticsearch-7.16.2.jar:7.16.2]",
elastic1_1          | "at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:119) ~[elasticsearch-7.16.2.jar:7.16.2]",
elastic1_1          | "at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:84) ~[elasticsearch-7.16.2.jar:7.16.2]",

The problem is that the error is triggered in loop, causing then further errors (incoming documents cannot be indexed, node disappearing, etc).

It looks like the cluster is trying to perform the aggregation request in loop and never cancels it even if the CircuitBreaker was hit. Does that makes sense?

Thanks

elephant · March 24, 2022, 9:33am

Hi,

I am still having the same issue. Is there a way to automatically cancel a search which triggers the CircuitBreaker? It seems the nodes in my cluster are trying infinitely to calculate the aggregations thus triggering the CircuitBreaker in loop.

spinscale · March 24, 2022, 1:13pm

The circuit breaker above is about bulk indexing, and not about querying (might be after your initial issue). Are you sending huge or many bulk requests as well?

Also the Elasticsearch version you are using might help to get a first idea.

elephant · March 25, 2022, 8:40pm

Hi,

You are totally right, I overlooked some details in the error message because I was focusing on the "InboundAggregator.finishAggregation" line.

Here are some more details:

Elastic version: Elasticsearch-7.16.2.jar
How to trigger the problem: launch an aggregation on a field with very high cardinality
What happens: fielddata cache increase to a point where circuitbreaker triggers because there is no more heap available

I assumed that it was the aggregation query that triggered the CircuitBreaker but you are right, it might be other operations like a small bulk indexing because there is no more heap available.

Looking at stats I can clearly see that fielddata cache uses all heap memory because of the aggregations.

Problem is that the fielddata cache is never flushed even though the aggregations was doomed to fail because of lack of heap memory. This has a side effects of killing our cluster (nearly every other action will trigger the CircuitBrealer in loop).

Also a side note, I managed to reproduce while doing aggregations on fields which have nearly hundred of millions of different values, but also on flattened fields. The strange part with flattened fields is that I am asking for an aggregation on document.name for example, and it seems Elastic is using heap to calculate aggregation on document.* in every index even though document.name is only present in a few indexes (I might be wrong here, debugging is quite tedious).

Solution I found: don't ask for aggregations on very high cardinality fields
What I expected:

elastic would filter first the documents then calculate aggregations for flattened types. It seems it does in reverse: calculate all possible aggregations on flattened types, then filter. Not sure about this, but looking at fielddata cache per index, that's what happens
some sore of automatic fielddata flushing instead of hitting the CircuitBreaker in loop for every other action

Thanks.

elephant · April 1, 2022, 11:15am

I haven't found a solution to our problem yet. I see many posts about CircuitBreaker errors, but few (if any) answers.

system · April 29, 2022, 11:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Circuit_breaking_exception for [request] Data too large, but index being queried is only 8.44mb Elasticsearch	1	696	September 16, 2022
Elasticsearch Request Circuit Breaker Elasticsearch	13	757	July 17, 2021
Elasticsearch aggregation encounters circuit_breaking_exception Elasticsearch	14	1844	February 12, 2018
[parent] Data too large (for agg or reused_arrays) Elasticsearch	1	1029	August 13, 2020
Circuit breaking Issue occurring Elasticsearch	2	41	May 21, 2025

Huge aggregation triggers CircuitBreaker in loop

Related topics