Elasticsearch Request Circuit Breaker

I am looking at indices.breaker.request.limit circuit breaker. Based on the description, my understanding is that it would break if an aggregation consumes the heap configured in the breaker. The default is 60% for version 6.8.

We are trying large cardinality terms aggregation. The metrics show that it barely goes beyond a handful of megabytes but the cluster seems to turn red. I am fairly certain it takes more memory for larger cardinality aggregations.

Wondering if I am misunderstanding the circuit breaker, if there's a known bug or I am simply missing something. Your insight would be helpful. Thank you.

What do the Elasticsearch logs show?

I believe at least some circuit breakers estimates the amount of memory required and triggers if this is too large rather that wait until it goes beyond a certain limit.

Log shows Out of heap space. Nothing about circuit breaker though.

It'd be useful to share the logs.

Circuit breaker

Logs and heap space. node b3 (data node) consumed ~29G heap space (was about 14 and spiked up quickly because of high cardinality request). Eventually it disconnected itself from the cluster (master probably kicked it out after timeout). The uber point is that the circuit breaker numbers are not even close to the heap space consumed by the large cardinality requests. The breaker didn't trip and protect the cluster.

We tried the inflight breaker setting to 1G & 2G.


[2021-06-16T21:13:07,729][WARN ][o.e.m.j.JvmGcMonitorService] [esdata-b3-cluster] [gc][old][537][5] duration [1.6m], collections [1]/[1.6m], total [1.6m]/[3m], memory [27.5gb]->[28.4gb]/[29.7gb], all_pools {[young] [449.4kb]->[712mb]/[1.8gb]}{[survivor] [232.9mb]->[0b]/[232.9mb]}{[old] [27.3gb]->[27.7gb]/[27.7gb]}

[2021-06-16T21:11:26,065][WARN ][o.e.m.j.JvmGcMonitorService] [esdata-b3] [gc][old][535][4] duration [1.4m], collections [1]/[1.4m], total [1.4m]/[1.4m], memory [27.2gb]->[26.2gb]/[29.7gb], all_pools {[young] [50.7mb]->[39.5mb]/[1.8gb]}{[survivor] [232.9mb]->[0b]/[232.9mb]}{[old] [27gb]->[26.2gb]/[27.7gb]}
.
.// more logs in between
.
[2021-06-16T21:09:39,557][WARN ][o.e.m.j.JvmGcMonitorService] [esdata-b3-cluster] [gc][young][523][30] duration [1.6s], collections [1]/[1.9s], total [1.6s]/[38.6s], memory [14.9gb]->[15.9gb]/[29.7gb], all_pools {[young] [11.7mb]->[1.9mb]/[1.8gb]}{[survivor] [232.9mb]->[232.9mb]/[232.9mb]}{[old] [14.6gb]->[15.7gb]/[27.7gb]}

I think you're misunderstanding the circuit breaker: it does not account for every memory consumer so will miss things sometimes. Note that the 6.8 series was released over two years ago so you're missing out on two years of further development in this area. Recent versions track memory usage much more accurately.

2 Likes

Thanks. Would it miss the cardinality aggregation? That is the specific aggregation that's tipping over the node (high memory usage).

Also, please let me know if there is a detailed documentation about the circuit breaker I can read (what does it account for and what it does not).

I'm not sure, but it sounds from your account that it does.

The big change in newer versions is described in this blog post:

With that change the parent circuit breaker effectively tracks everything on the heap.

Unfortunately, upgrade to 7.x is just on the roadmap at this point given that it does not support a transport client and also getting away from the types. So, we have to continue with this version for a while. I hope Elastic continues to support their customers with version 6.8.

If there is an alternative to protect the cluster for large cardinality requests (buckets explosion), please let me know.

This is inaccurate:

  • The main change in the removal of mapping types already happened, in 6.0. The only changes in this area in subsequent versions relate to adjusting any APIs that unnecessarily mention the (now-unique) type.

  • The Java transport client is supported in all 7.x versions. It's deprecated, indicating our intention to remove support for it in 8.0, but that doesn't affect anything in 7.x.

2 Likes

Here's what I read for types:

Indices created in Elasticsearch 7.0.0 or later no longer accept a _default_ mapping. Indices created in 6.x will continue to function as before in Elasticsearch 6.x. Types are deprecated in APIs in 7.0, with breaking changes to the index creation, put mapping, get mapping, put template, get template and get field mappings APIs.

We are using the default type in 6.8, so got to be working.

We are using elastic4s Scala client for APIs and they stopped supporting updates to TCP client, so we need to update it.

Yes that sounds about right. There's no need for a _default_ mapping now that indices only have one type, nor is there any need to mention types in the APIs. These don't need any structural changes to adopt, you've already done the hard bit of moving to single-type indices in 6.x.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.