We have configured 3node cluster (2masters and 1data node),we have two sites prod and DR. between prod and DR we configured replication via kafka.
We have observed below errors in logstash, and elasticsearch on DR site, please help if anyone have solution
Logstash:
[2021-08-05T09:51:44,040][INFO ][logstash.outputs.elasticsearch][main][634f22b0afc3af812df90fcd9873c667fb3a3481e393f6b139ed4d1a1a6cbde0] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [20739871956/19.3gb], which is larger than the limit of [20401094656/19gb], real usage: [20739756616/19.3gb], new bytes reserved: [115340/112.6kb], usages [request=0/0b, fielddata=17534/17.1kb, in_flight_requests=115340/112.6kb, accounting=41630092/39.7mb]", "bytes_wanted"=>20739871956, "bytes_limit"=>20401094656, "durability"=>"PERMANENT"})
[2021-08-05T09:51:44,041][INFO ][logstash.outputs.elasticsearch][main][634f22b0afc3af812df90fcd9873c667fb3a3481e393f6b139ed4d1a1a6cbde0] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>13}
[2021-08-05T09:51:44,102][INFO ][logstash.outputs.elasticsearch][main][cdaa1f772bd03dd6fa85d04fee4566660bd9c92ab3a78c1488c1a41d42baffc0] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [20739758688/19.3gb], which is larger than the limit of [20401094656/19gb], real usage: [20739756616/19.3gb], new bytes reserved: [2072/2kb], usages [request=0/0b, fielddata=17534/17.1kb, in_flight_requests=2072/2kb, accounting=41630092/39.7mb]", "bytes_wanted"=>20739758688, "bytes_limit"=>20401094656, "durability"=>"PERMANENT"})
[2021-08-05T09:51:44,103][INFO ][logstash.outputs.elasticsearch][main][cdaa1f772bd03dd6fa85d04fee4566660bd9c92ab3a78c1488c1a41d42baffc0] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}
[2021-08-05T09:53:03,341][WARN ][logstash.outputs.elasticsearch][main][6242f8bf53d6d507d1bd30d602577d78c5526db79dbc9812f3eff84ee4b1fd22] Could not index event to Elasticsearch. {:status=>404, :action=>["index", {:_id=>nil, :_index=>"imon-2021.08.05", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x145580c>], :response=>{"index"=>{"_index"=>"imon-2021.08.05", "_type"=>"_doc", "_id"=>"HkUVFXsBfmjxxyqhT-tp", "status"=>404, "error"=>{"type"=>"shard_not_found_exception", "reason"=>"no such shard", "index_uuid"=>"xHoxZsqOTP2XArHyCITDOQ", "shard"=>"0", "index"=>"imon-2021.08.05"}}}}
Elasticsearch:(datanode)
[2021-08-05T09:47:05,616][WARN ][o.e.a.b.TransportShardBulkAction] [per-rep-s03] [[imon-2021.08.05][0]] failed to perform indices:data/write/bulk[s] on replica [imon-2021.08.05][0], node[MjvAUYNuTs61rUfxe4rtLw], [R], s[STARTED], a[id=TzrBrgaLQ0GxWaJYMtBmww]
org.elasticsearch.transport.RemoteTransportException: [per-rep-s01][192.25.41.12:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [20404240050/19gb], which is larger than the limit of [20401094656/19gb], real usage: [20404212296/19gb], new bytes reserved: [27754/27.1kb], usages [request=32880/32.1kb, fielddata=17534/17.1kb, in_flight_requests=901894/880.7kb, accounting=41626352/39.6mb]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:347) ~[elasticsearch-7.7.0.jar:7.7.0]
Elasticsearch:(masternode)
[2021-08-05T21:57:40,398][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [inip-2021.08.05][0], node[eM22fGYZS46Ea5fAc76yrA], [P], s[STARTED], a[id=8vdr4w_lSOO5iPbUOY3JVg], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,399][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [rstraffic-2021.08.05][0], node[gAptaWnIQUKzRwthC6wutg], [P], s[STARTED], a[id=ZNwYd9hSTb-2cBkLq9tEWQ], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,400][WARN ][o.e.c.r.a.AllocationService] [per-rep-s01] failing shard [failed shard, shard [imon-2021.08.05][0], node[gAptaWnIQUKzRwthC6wutg], [P], s[STARTED], a[id=Eqwwssh0SLmZjEzOILyL2A], message [master {per-rep-s01}{MjvAUYNuTs61rUfxe4rtLw}{LfvqR9UcQfmJAJ-h2AVevQ}{192.25.41.12}{192.25.41.12:9300}{dmt}{xpack.installed=true, transform.node=true} has not removed previously failed shard. resending shard failure], failure [Unknown], markAsStale [true]]
[2021-08-05T21:57:40,406][INFO ][o.e.c.r.a.AllocationService] [per-rep-s01] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[inip-2021.08.05][0], [rstraffic-2021.08.05][0], [imon-2021.08.05][0]]]).
[2021-08-05T21:57:40,477][INFO ][o.e.i.s.IndexShard ] [per-rep-s01] [rstraffic-2021.08.05][0] primary-replica resync completed with 0 operations