Circuit break exception - version 7.7.0

ravindrababua · August 16, 2021, 3:46pm

We have configured 3node cluster (2masters and 1data node),we have two sites prod and DR. between prod and DR we configured replication via kafka.

We have observed below errors in logstash, and elasticsearch on DR site, please help if anyone have solution

Logstash:

[2021-08-05T09:51:44,040][INFO ][logstash.outputs.elasticsearch][main][634f22b0afc3af812df90fcd9873c667fb3a3481e393f6b139ed4d1a1a6cbde0] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [20739871956/19.3gb], which is larger than the limit of [20401094656/19gb], real usage: [20739756616/19.3gb], new bytes reserved: [115340/112.6kb], usages [request=0/0b, fielddata=17534/17.1kb, in_flight_requests=115340/112.6kb, accounting=41630092/39.7mb]", "bytes_wanted"=>20739871956, "bytes_limit"=>20401094656, "durability"=>"PERMANENT"})
[2021-08-05T09:51:44,041][INFO ][logstash.outputs.elasticsearch][main][634f22b0afc3af812df90fcd9873c667fb3a3481e393f6b139ed4d1a1a6cbde0] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>13}
[2021-08-05T09:51:44,102][INFO ][logstash.outputs.elasticsearch][main][cdaa1f772bd03dd6fa85d04fee4566660bd9c92ab3a78c1488c1a41d42baffc0] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [20739758688/19.3gb], which is larger than the limit of [20401094656/19gb], real usage: [20739756616/19.3gb], new bytes reserved: [2072/2kb], usages [request=0/0b, fielddata=17534/17.1kb, in_flight_requests=2072/2kb, accounting=41630092/39.7mb]", "bytes_wanted"=>20739758688, "bytes_limit"=>20401094656, "durability"=>"PERMANENT"})
[2021-08-05T09:51:44,103][INFO ][logstash.outputs.elasticsearch][main][cdaa1f772bd03dd6fa85d04fee4566660bd9c92ab3a78c1488c1a41d42baffc0] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}
[2021-08-05T09:53:03,341][WARN ][logstash.outputs.elasticsearch][main][6242f8bf53d6d507d1bd30d602577d78c5526db79dbc9812f3eff84ee4b1fd22] Could not index event to Elasticsearch. {:status=>404, :action=>["index", {:_id=>nil, :_index=>"imon-2021.08.05", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x145580c>], :response=>{"index"=>{"_index"=>"imon-2021.08.05", "_type"=>"_doc", "_id"=>"HkUVFXsBfmjxxyqhT-tp", "status"=>404, "error"=>{"type"=>"shard_not_found_exception", "reason"=>"no such shard", "index_uuid"=>"xHoxZsqOTP2XArHyCITDOQ", "shard"=>"0", "index"=>"imon-2021.08.05"}}}}

Elasticsearch:(datanode)

[2021-08-05T09:47:05,616][WARN ][o.e.a.b.TransportShardBulkAction] [per-rep-s03] [[imon-2021.08.05][0]] failed to perform indices:data/write/bulk[s] on replica [imon-2021.08.05][0], node[MjvAUYNuTs61rUfxe4rtLw], [R], s[STARTED], a[id=TzrBrgaLQ0GxWaJYMtBmww]
org.elasticsearch.transport.RemoteTransportException: [per-rep-s01][192.25.41.12:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [20404240050/19gb], which is larger than the limit of [20401094656/19gb], real usage: [20404212296/19gb], new bytes reserved: [27754/27.1kb], usages [request=32880/32.1kb, fielddata=17534/17.1kb, in_flight_requests=901894/880.7kb, accounting=41626352/39.6mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:347) ~[elasticsearch-7.7.0.jar:7.7.0]

Elasticsearch:(masternode)

[2021-08-05T21:57:40,398][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [inip-2021.08.05][0], node[eM22fGYZS46Ea5fAc76yrA], [P], s[STARTED], a[id=8vdr4w_lSOO5iPbUOY3JVg], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,399][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [rstraffic-2021.08.05][0], node[gAptaWnIQUKzRwthC6wutg], [P], s[STARTED], a[id=ZNwYd9hSTb-2cBkLq9tEWQ], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,400][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [imon-2021.08.05][0], node[gAptaWnIQUKzRwthC6wutg], [P], s[STARTED], a[id=Eqwwssh0SLmZjEzOILyL2A], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,406][INFO ][o.e.c.r.a.AllocationService] [per-rep-s01] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[inip-2021.08.05][0], [rstraffic-2021.08.05][0], [imon-2021.08.05][0]]]).
[2021-08-05T21:57:40,477][INFO ][o.e.i.s.IndexShard       ] [per-rep-s01] [rstraffic-2021.08.05][0] primary-replica resync completed with 0 operations

warkolm · August 16, 2021, 10:17pm

Welcome to our community!
Please format your code/logs/config using the </> button, or markdown style back ticks. It helps to make things easy to read which helps us help you.

It looks like you are sending a request that might have a larger than usual event in there.
Are you able to find anything in your data that might account for that?

ravindrababua · August 25, 2021, 6:12pm

We have observed ELK sending the data with in the cluster nodes, when the data is tool large it throws circuit break exception and cluster goes to YELLOW, but immediately with in few seconds it comes to the normal state.

why it will send ( or replicate ) large data with in the cluster node at a time ?

once data transmission is completed it comes to the normal state.

[2021-08-09T01:11:30,001][WARN ][o.e.a.b.TransportShardBulkAction] [rep-s02] [[rstraffic-2021.08.08][0]] failed to perform indices:data/write/bulk[s] on replica [rstraffic-2021.08.08][0], node[v2-ht7q_RcuEQWt-P0yJmg], [R], s[STARTED], a[id=eUpGCnydT3iIMjm76Qg3zQ]
org.elasticsearch.transport.RemoteTransportException: [rep-s03]

[192.25.41.12:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [20465471546/19gb], which is larger than the limit of [20401094656/19gb], real usage: [20465468584/19gb], new bytes reserved: [2962/2.8kb], usages [request=0/0b, fielddata=11621082/11mb, in_flight_requests=9038/8.8kb, accounting=55508596/52.9mb]

[2021-08-09T01:11:30,013][INFO ][o.e.c.r.a.AllocationService] [rep-s02] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[rstraffic-2021.08.08][0]]]).
[2021-08-09T01:11:32,895][INFO ][o.e.c.r.a.AllocationService] [rep-s02] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[rstraffic-2021.08.08][0]]]).

ravindrababua · September 13, 2021, 8:35am

Hi Team,
Could you please provide any update on this.

system · October 11, 2021, 8:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash circuit breaking Logstash	3	308	October 25, 2022
I am getting break circuit exception in logstash logs, Logstash	3	504	November 11, 2019
Circuit breaker exception resulted in temination of all the nodes of Elasticsearch Kibana	5	650	September 3, 2020
Circuit break exception Elasticsearch	6	305	August 2, 2022
Error code 429 - circuit_breaking_exception Elasticsearch	10	6784	November 8, 2019

Circuit break exception - version 7.7.0

Related topics