Circuit breaker in Elasticsearch

we are running an Elasticsearch cluster with 3 nodes.
sometimes a heavy agg query comes(run manually from Kibana dev tools) and one of the nodes becomes inaccessible.
So full cluster becomes inaccessible as ES takes some time to remove that node.
so we restart that node, so that the cluster is running.

Now, we have implemented circuit breaker in Elasticsearch with = 10GB
network.breaker.inflight_requests.limit = 10GB

Now, if i run same heavy agg query, elasticsearch becomes again un-responsive with Circuit breaker exception and other indexing/search request also start to through exception
Exception: [parent] Data too large, data for [<transport_request>] would be [10856018759/10.1gb], which is larger than the limit of [10737418240/10gb], usages [request=0/0b, fielddata=6552153321/6.1gb, in_flight_requests=723/723b.

If there a way to control memory usage for a single request?
Should be lower the fielddata/query cache limit ?

3 TB Disk
16 Core CPU
30 GB Heap to Elasticsearch
Avg heap utilization is ~27GB.

Which version of Elasticsearch are you using?

6.4 version

I'd recommend upgrading, 6.4 is far past EOL and there have been significant improvements since then.

That being said, your average heap usage is pretty high 27GB/30GB = ~90% usage. It is generally recommended that average heap usage should be <70%.

Without knowing much about your cluster (and also not knowing much about the 6.x release anymore). You might want to look at adding some addition nodes to spread load/increase total cluster heap. (But you should really look at upgrading)

What is the full output of the cluster stats API?

What is the use case? Are you using time-based indices?

to decrease heap usages, should i reduced fielddata cache limit ?

i have pageviews data, indexes are made time-bases(monthly). we have visitorId, that is unique to each user*browser. and i have ~30 milliion doc monthly.

i have implemented circuit breaker.

then i execute following query:

GET pageviews_data_m*/_search

After running this query, circuit breaker exception accur, and all other requests (insert/search) also get same exception.

is there is a way to implement circuit breaker at request level?

Are you updating older data or are the monthly indices effectively read-only once the new index is created? If this is the case you may be able to lower heap usage by forcemerging old, read-only indices down to a single segment.

I am not updating data in old indexes.

also, is there is a way to implement circuit breaker at request level?

I do not think there is a lot you can do in 6.4, but have not used it in years. A lot has been improved with respect to circuit breakers and heap usage in newer versions though, so I would recommend you upgrade.

Making sure you do not have a lot of very small shards and forcemerging old indices down to a single segment can help reduce the amount of heap used. Making sure your mappings are optimised is also a good step to take.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.