Circuit breaker to prevent ES client from having OOM problem

Hi all:

Our elasticsearch cluster runs well for the most part, except that the client node would crash because of out-of-memory problem when executing large queries. (I think) It crashes in the gathering phase when all data nodes send partial results back, the datasets get too big to fit in the heap limit.

I searched for an appropriate circuit breaker to cancel the query if the data is potentially too large to fit in memory, especially for client nodes. I tried indices.breaker.request.limit on all ES nodes, but sounds like it only applies to data nodes. Did I miss anything or it's the expected behavior? If not, is there any built-in solution to solve my problem? Thanks.

Unfortunately circuit breakers are best-effort only, they don't cover all cases even though we are trying to improve them in newer releases as well as add changes that make out-of-memory errors less likely to happen in practice. What version are you using? For instance 5.4 added the ability to reduce shard responses in batches when gathering responses from many shards at the same time, are you on 5.4 or a more recent version?

1 Like

Hey Adrien:

Thanks for the reply! I should've mentioned in the original post, we are using 5.6.2. Can you point me some references about shard response reduction? I'll do some research as well.

Sure, here is the change that introduced it.

Hmm, I thought it was some settings to flip so it's enabled/disabled, but sounds like it comes out of the box by default. Thanks for letting me know about this anyway, feel free to let me know if there's any other possible options.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.