Error code 429 - circuit_breaking_exception

worapojc · October 7, 2019, 7:53am

Hi Elastic team,

I got this error from Logstash logs.

[2019-10-07T07:40:55,341][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [16414928216/15.2gb], which is larger than the limit of [16320875724/15.1gb], real usage: [16414925088/15.2gb], new bytes reserved: [3128/3kb]", "bytes_wanted"=>16414928216, "bytes_limit"=>16320875724, "durability"=>"TRANSIENT"})

I know you have questions on this but I don't understand it clearly in the other posts.

My cluster hardware specification details

Master: 3 nodes (CPU 4 RAM 32, HEAP SIZE 16GB)
Hot-data (and Ingest): 12 nodes (CPU 8 RAM 64, HEAP SIZE 32GB)
Warm-data: 3 nodes (CPU 8 RAM 64, HEAP SIZE 32GB)
Cold-data: 3 nodes (CPU 8 RAM 64, HEAP SIZE 32GB)

My cluster usage

Indexing rate: 5,000 - 10,000+ / sec (Only primary shards)
Indices: 3500+
Primary shards: 4594
Active shards: 9192
Shards / hot-data node: ~300 (Max is 400+ during ILM)
Shards / warm-data node: ~600 - 700 (Max is 800+ during ILM) (Indices are read-only)
Shards / cold-data node: ~1100 - 1200 (Max is 1300+ during ILM) (Indices are read-only and frozen)

From the error above, your suggestion is to increase the heap size in the other posts.
However, I'm not sure what nodes that I should increase the heap size; master, hot-data (ingest)???

Thank you

willemdh · October 7, 2019, 10:30am

@worapojc Very recently I had circuit breaker exceptions that were caused by the heap setting in jvm.option on our ingest nodes. Those only had 4GB RAM. Increased to 12GB and didn't see any circuit breakers errors since.

worapojc · October 9, 2019, 8:34am

Thank you. It is strange for my case. The ingest nodes already have 32GB RAM for Heap size. Only the master nodes have 16GB RAM for Heap size.

Armin_Braun · October 9, 2019, 10:33am

@worapojc Could it be that you are sending those requests to the master node(s) given that the circuit breaker exception contains a reference to ~16GB of heap?
I think removing the hosts of the master nodes from your logstash configuration and sending requests to your hot nodes might help here.

In case it doesn't feel free to share your jvm.options so I can take a look and see if something can be optimized there.

worapojc · October 9, 2019, 11:29am

Thanks, Armin. The logstash output configuration only has the hot-data nodes.

Currently, I have increased the memory size of the master nodes to be 32GB of heap.
The issue has fixed but another issue still exists.

Some APIs are timeout response.

_cat/shards
_cat/nodes
_cat/indices
_cluster/stats

The response is..

{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}

Armin_Braun · October 9, 2019, 11:32am

@worapojc Elasticsearch never returns a 504 from its APIs. The issue must be coming from something (HTTP proxy of some sort or Kibana) between the client and ES. You shouldn't see those errors when directly calling the ES REST API.

worapojc · October 10, 2019, 4:13am

Thanks @Armin_Braun.

I've tested the API directly. The response time is high.

_cat/shards : 92.859375s
_cat/nodes : 90.799507s
_cat/indices : 97.222450s
_cluster/stats : 84.900175s

This cluster is still working for indexing and searching but the API response time issue occurred for a week. The Kibana monitoring malfunction.

How to resolve this issue?

pokaleshrey · October 10, 2019, 4:32am

@willemdh, This is because ES maintains some data structures in your heap permanently, which is very well related to the amount of data you have indexed. We have played a lot around this, and the only solution we got is firstly following ES suggestions for tuning indexed data. Secondly, design your system to scale horizontally so that each data node holds lower amount of data, which in turn helps consuming less heap.
Heap size can be different for each use case. Increasing RAM after a certain amount is not a solution, the more amount of RAM you give for system to play with, that faster is your query response.

Armin_Braun · October 10, 2019, 8:46am

It's hard to tell what causes this and there's multiple possible causes. I would try and look into whether one or multiple of your nodes are abnormally slow for some reason (e.g. they could be swapping which would probably show as very long GC times and warnings about those in their logs).

worapojc · October 11, 2019, 9:09am

Thanks Armin. I did the rolling restart all hot-data nodes. The API is good now.

system · November 8, 2019, 9:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error 29 Too Many Requests and circuit breaking exception Elasticsearch	2	2696	March 4, 2020
Circuit Breaker Exception + Data Too Large for HTTP Request + HTTP/1.1 429 Too Many Requests Elasticsearch	7	3209	April 8, 2021
Circuit_breaking_exception Elasticsearch	13	1191	March 3, 2021
Elasticsearch shows Circuit Breaking Exception Elasticsearch	6	819	February 1, 2021
Circuit breaker problem Elasticsearch	4	1100	November 30, 2020

Error code 429 - circuit_breaking_exception

Related topics