In flight circuit breaker explanation

Hello !

I have one question about the circuit breaker "in flight requests"
What exactly it means ? For instance i share with you a sreenshot where the circuit break in flight requests increase and while this time i had bulk rejections indexing.

I didn't find on the documentation exactly what are included in this metrics ? After 11H30 when the inflight requests decreased my bulk rejections in indexing disapeared.

In summary, i don't understand where this "traffic" come from, is it relocate ? is it write or read request ? What is the difference with request circuit breaker?

FYI, i run ES 7.0.1 + java12



Anyone to explain me what is exactly in flight circuit breaker and why it increases on only one node when my cluster is pretty well balanced.


It's tracking the total size of all in-flight requests on the node, i.e., the total size of all the messages that a node has received but to which it has not yet responded.

The request circuit breaker tracks per-request data structures (for example, memory used for calculating aggregations during a request).

Thanks @DavidTurner, do you have an idea why it increases only in one node when my shards are well balanced ?

Are you balancing your client requests evenly across all your nodes? Is this node running more slowly than the others?

Yes, in front of my hot nodes i have two coordinations nodes which are behind 4 nginx proxy
So the traffic should be evenly across the two coordinations nodes and the coordinations nodes should spread the traffic on my hot nodes.

That's why i don't understand what happened.
I know one shard in indexing costs one thread of the CPU, i have 12 threads per node and i had only 6 shards in write.
The thread pool write was always busy at 12.
My disks are 3 SSD of 500Gb.
The rate indexing was ~6.000 requests per second on this node and it's not much, i already had 15.000 requests per second on one other node without issue.

A node with more in-flight data than its peers is either serving more requests or else it's serving them more slowly. Can you use APIs like GET _nodes/stats to work out which of these applies?

Can you describe your architecture in more detail? Is the problematic node a coordinating node or a data node or something else? Is it perhaps the elected master node? Can you show a comparison between the problematic node and another node that you expect to be identical?

I would also look more deeply at whether the client traffic is really being balanced properly across your nodes. It's not always easy to balance load evenly, particularly if you're using long-running HTTP connections or some kind of stickiness.

By the way this isn't true, there can be many threads indexing into each shard all at the same time.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.