CircuitBreakingException - Data too large

Following on from this topic, we've extracted and expanded on this excellent explanation from @HenningAndersen.

Seeing an error like this means that Elasticsearch prevented a request from executing to avoid an out of memory (OOM) error. The documentation on circuit breakers goes into more detail on the approach to the various breakers, and how they are configured.

Here's an example of a response back to the client when a circuit breaker is tripped;

[2020-06-25T16:05:11,629][WARN ][r.suppressed ] [host-name] path: /_bulk, params: {}
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<http_request>] would be [11229874195/10.4gb], which is larger than the limit of [11225477939/10.4gb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=11226443799/10.4gb, accounting=3430396/3.2mb]

Let's highlight a few sections to better understand what's happening;

  • The first section of note in the entry is the path value, which shows this was a request to the _bulk API.
  • Next, it says [parent], which means it is the parent breaker tripping. This breaker is responsible for the overall memory usage. Since 7.0 we use the real memory circuit breaker, which measures real memory use.
  • Then [<http_request>] is the info/type of the request, which aligns with the first point. Another common one would be transport_request, which is the internal communication protocol from one node to another, either internally in the cluster or between two remote clusters.
  • Then would be [11229874195/10.4gb] means that the current memory usage together with the memory usage of the request would be 10.4gb.
  • Then limit of [11225477939/10.4gb] is the limit which the request needs to be below to be allowed through to Elasticsearch to be processed.
  • The usages section then breaks down what it is seeing based on the request

The important thing to note here is that there's a difference of 4396256 bytes, or ~4MB, between the total that the request would need, and the circuit breaker limit. This can be easily missed due to the way that Elasticsearch summarises the size of the request into gb.

There are a few things to look into if you encounter this response;

  1. In this case, you should check the size of your bulk request, and try reducing it. Otherwise;
  2. Something else is holding on to excessive amounts of memory. Note that some parts of Elasticsearch will auto-scale with heap size
  3. The garbage collection (GC) cannot, or did not, keep up with garbage in the heap causing the node to go above the circuit breaker limit
1 Like