Elasticsearch Request Circuit Breaker

animageofmine · June 14, 2021, 7:15pm

I am looking at indices.breaker.request.limit circuit breaker. Based on the description, my understanding is that it would break if an aggregation consumes the heap configured in the breaker. The default is 60% for version 6.8.

We are trying large cardinality terms aggregation. The metrics show that it barely goes beyond a handful of megabytes but the cluster seems to turn red. I am fairly certain it takes more memory for larger cardinality aggregations.

Wondering if I am misunderstanding the circuit breaker, if there's a known bug or I am simply missing something. Your insight would be helpful. Thank you.

warkolm · June 15, 2021, 2:01am

What do the Elasticsearch logs show?

Christian_Dahlqvist · June 15, 2021, 4:17am

I believe at least some circuit breakers estimates the amount of memory required and triggers if this is too large rather that wait until it goes beyond a certain limit.

animageofmine · June 15, 2021, 11:51pm

Log shows Out of heap space. Nothing about circuit breaker though.

warkolm · June 15, 2021, 11:59pm

It'd be useful to share the logs.

animageofmine · June 16, 2021, 10:20pm

Circuit breaker

Logs and heap space. node b3 (data node) consumed ~29G heap space (was about 14 and spiked up quickly because of high cardinality request). Eventually it disconnected itself from the cluster (master probably kicked it out after timeout). The uber point is that the circuit breaker numbers are not even close to the heap space consumed by the large cardinality requests. The breaker didn't trip and protect the cluster.

We tried the inflight breaker setting to 1G & 2G.


[2021-06-16T21:13:07,729][WARN ][o.e.m.j.JvmGcMonitorService] [esdata-b3-cluster] [gc][old][537][5] duration [1.6m], collections [1]/[1.6m], total [1.6m]/[3m], memory [27.5gb]->[28.4gb]/[29.7gb], all_pools {[young] [449.4kb]->[712mb]/[1.8gb]}{[survivor] [232.9mb]->[0b]/[232.9mb]}{[old] [27.3gb]->[27.7gb]/[27.7gb]}

[2021-06-16T21:11:26,065][WARN ][o.e.m.j.JvmGcMonitorService] [esdata-b3] [gc][old][535][4] duration [1.4m], collections [1]/[1.4m], total [1.4m]/[1.4m], memory [27.2gb]->[26.2gb]/[29.7gb], all_pools {[young] [50.7mb]->[39.5mb]/[1.8gb]}{[survivor] [232.9mb]->[0b]/[232.9mb]}{[old] [27gb]->[26.2gb]/[27.7gb]}
.
.// more logs in between
.
[2021-06-16T21:09:39,557][WARN ][o.e.m.j.JvmGcMonitorService] [esdata-b3-cluster] [gc][young][523][30] duration [1.6s], collections [1]/[1.9s], total [1.6s]/[38.6s], memory [14.9gb]->[15.9gb]/[29.7gb], all_pools {[young] [11.7mb]->[1.9mb]/[1.8gb]}{[survivor] [232.9mb]->[232.9mb]/[232.9mb]}{[old] [14.6gb]->[15.7gb]/[27.7gb]}

DavidTurner · June 17, 2021, 7:49am

I think you're misunderstanding the circuit breaker: it does not account for every memory consumer so will miss things sometimes. Note that the 6.8 series was released over two years ago so you're missing out on two years of further development in this area. Recent versions track memory usage much more accurately.

animageofmine · June 17, 2021, 6:30pm

Thanks. Would it miss the cardinality aggregation? That is the specific aggregation that's tipping over the node (high memory usage).

Also, please let me know if there is a detailed documentation about the circuit breaker I can read (what does it account for and what it does not).

DavidTurner · June 17, 2021, 7:29pm

I'm not sure, but it sounds from your account that it does.

The big change in newer versions is described in this blog post:

With that change the parent circuit breaker effectively tracks everything on the heap.

animageofmine · June 18, 2021, 6:18am

Unfortunately, upgrade to 7.x is just on the roadmap at this point given that it does not support a transport client and also getting away from the types. So, we have to continue with this version for a while. I hope Elastic continues to support their customers with version 6.8.

If there is an alternative to protect the cluster for large cardinality requests (buckets explosion), please let me know.

DavidTurner · June 18, 2021, 6:53am

This is inaccurate:

The main change in the removal of mapping types already happened, in 6.0. The only changes in this area in subsequent versions relate to adjusting any APIs that unnecessarily mention the (now-unique) type.
The Java transport client is supported in all 7.x versions. It's deprecated, indicating our intention to remove support for it in 8.0, but that doesn't affect anything in 7.x.

animageofmine · June 18, 2021, 6:50pm

Here's what I read for types:

Indices created in Elasticsearch 7.0.0 or later no longer accept a _default_ mapping. Indices created in 6.x will continue to function as before in Elasticsearch 6.x. Types are deprecated in APIs in 7.0, with breaking changes to the index creation, put mapping, get mapping, put template, get template and get field mappings APIs.

We are using the default type in 6.8, so got to be working.

We are using elastic4s Scala client for APIs and they stopped supporting updates to TCP client, so we need to update it.

DavidTurner · June 19, 2021, 5:55am

Yes that sounds about right. There's no need for a _default_ mapping now that indices only have one type, nor is there any need to mention types in the APIs. These don't need any structural changes to adopt, you've already done the hard bit of moving to single-type indices in 6.x.

system · July 17, 2021, 5:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Circuit Breaker Exception Elasticsearch	10	903	February 7, 2022
Circuit Breaker not stopping my search query even though my heap size crosses Elasticsearch	3	544	February 20, 2019
Elasticsearch per request circuit breakers and real memory Elasticsearch	3	359	December 14, 2021
Circuit Breaker Exception Elasticsearch	5	188	December 7, 2022
Request circuit breaker Elasticsearch	2	180	April 19, 2024

Elasticsearch Request Circuit Breaker

Related topics