Circuit break exception - version 7.7.0

We have configured 3node cluster (2masters and 1data node),we have two sites prod and DR. between prod and DR we configured replication via kafka.

We have observed below errors in logstash, and elasticsearch on DR site, please help if anyone have solution

Logstash:

[2021-08-05T09:51:44,040][INFO ][logstash.outputs.elasticsearch][main][634f22b0afc3af812df90fcd9873c667fb3a3481e393f6b139ed4d1a1a6cbde0] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [20739871956/19.3gb], which is larger than the limit of [20401094656/19gb], real usage: [20739756616/19.3gb], new bytes reserved: [115340/112.6kb], usages [request=0/0b, fielddata=17534/17.1kb, in_flight_requests=115340/112.6kb, accounting=41630092/39.7mb]", "bytes_wanted"=>20739871956, "bytes_limit"=>20401094656, "durability"=>"PERMANENT"})
[2021-08-05T09:51:44,041][INFO ][logstash.outputs.elasticsearch][main][634f22b0afc3af812df90fcd9873c667fb3a3481e393f6b139ed4d1a1a6cbde0] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>13}
[2021-08-05T09:51:44,102][INFO ][logstash.outputs.elasticsearch][main][cdaa1f772bd03dd6fa85d04fee4566660bd9c92ab3a78c1488c1a41d42baffc0] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [20739758688/19.3gb], which is larger than the limit of [20401094656/19gb], real usage: [20739756616/19.3gb], new bytes reserved: [2072/2kb], usages [request=0/0b, fielddata=17534/17.1kb, in_flight_requests=2072/2kb, accounting=41630092/39.7mb]", "bytes_wanted"=>20739758688, "bytes_limit"=>20401094656, "durability"=>"PERMANENT"})
[2021-08-05T09:51:44,103][INFO ][logstash.outputs.elasticsearch][main][cdaa1f772bd03dd6fa85d04fee4566660bd9c92ab3a78c1488c1a41d42baffc0] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}
[2021-08-05T09:53:03,341][WARN ][logstash.outputs.elasticsearch][main][6242f8bf53d6d507d1bd30d602577d78c5526db79dbc9812f3eff84ee4b1fd22] Could not index event to Elasticsearch. {:status=>404, :action=>["index", {:_id=>nil, :_index=>"imon-2021.08.05", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x145580c>], :response=>{"index"=>{"_index"=>"imon-2021.08.05", "_type"=>"_doc", "_id"=>"HkUVFXsBfmjxxyqhT-tp", "status"=>404, "error"=>{"type"=>"shard_not_found_exception", "reason"=>"no such shard", "index_uuid"=>"xHoxZsqOTP2XArHyCITDOQ", "shard"=>"0", "index"=>"imon-2021.08.05"}}}}

Elasticsearch:(datanode)

[2021-08-05T09:47:05,616][WARN ][o.e.a.b.TransportShardBulkAction] [per-rep-s03] [[imon-2021.08.05][0]] failed to perform indices:data/write/bulk[s] on replica [imon-2021.08.05][0], node[MjvAUYNuTs61rUfxe4rtLw], [R], s[STARTED], a[id=TzrBrgaLQ0GxWaJYMtBmww]
org.elasticsearch.transport.RemoteTransportException: [per-rep-s01][192.25.41.12:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [20404240050/19gb], which is larger than the limit of [20401094656/19gb], real usage: [20404212296/19gb], new bytes reserved: [27754/27.1kb], usages [request=32880/32.1kb, fielddata=17534/17.1kb, in_flight_requests=901894/880.7kb, accounting=41626352/39.6mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:347) ~[elasticsearch-7.7.0.jar:7.7.0]

Elasticsearch:(masternode)

[2021-08-05T21:57:40,398][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [inip-2021.08.05][0], node[eM22fGYZS46Ea5fAc76yrA], [P], s[STARTED], a[id=8vdr4w_lSOO5iPbUOY3JVg], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,399][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [rstraffic-2021.08.05][0], node[gAptaWnIQUKzRwthC6wutg], [P], s[STARTED], a[id=ZNwYd9hSTb-2cBkLq9tEWQ], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,400][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [imon-2021.08.05][0], node[gAptaWnIQUKzRwthC6wutg], [P], s[STARTED], a[id=Eqwwssh0SLmZjEzOILyL2A], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,406][INFO ][o.e.c.r.a.AllocationService] [per-rep-s01] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[inip-2021.08.05][0], [rstraffic-2021.08.05][0], [imon-2021.08.05][0]]]).
[2021-08-05T21:57:40,477][INFO ][o.e.i.s.IndexShard       ] [per-rep-s01] [rstraffic-2021.08.05][0] primary-replica resync completed with 0 operations

Welcome to our community! :smiley:
Please format your code/logs/config using the </> button, or markdown style back ticks. It helps to make things easy to read which helps us help you.

It looks like you are sending a request that might have a larger than usual event in there.
Are you able to find anything in your data that might account for that?

We have observed ELK sending the data with in the cluster nodes, when the data is tool large it throws circuit break exception and cluster goes to YELLOW, but immediately with in few seconds it comes to the normal state.

why it will send ( or replicate ) large data with in the cluster node at a time ?

once data transmission is completed it comes to the normal state.

[2021-08-09T01:11:30,001][WARN ][o.e.a.b.TransportShardBulkAction] [rep-s02] [[rstraffic-2021.08.08][0]] failed to perform indices:data/write/bulk[s] on replica [rstraffic-2021.08.08][0], node[v2-ht7q_RcuEQWt-P0yJmg], [R], s[STARTED], a[id=eUpGCnydT3iIMjm76Qg3zQ]
org.elasticsearch.transport.RemoteTransportException: [rep-s03]
[192.25.41.12:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [20465471546/19gb], which is larger than the limit of [20401094656/19gb], real usage: [20465468584/19gb], new bytes reserved: [2962/2.8kb], usages [request=0/0b, fielddata=11621082/11mb, in_flight_requests=9038/8.8kb, accounting=55508596/52.9mb]
[2021-08-09T01:11:30,013][INFO ][o.e.c.r.a.AllocationService] [rep-s02] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[rstraffic-2021.08.08][0]]]).
[2021-08-09T01:11:32,895][INFO ][o.e.c.r.a.AllocationService] [rep-s02] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[rstraffic-2021.08.08][0]]]).

Hi Team,
Could you please provide any update on this.