Cluster Error: data too large - marking and sending shards

Hi, i had a power issue and the cluster had been down for nearly 10 hours.

Now I have the following outputs:

http://serverip:9200/_cluster/health

cluster_name "cluster-name"
status "yellow"
timed_out false
number_of_nodes 13
number_of_data_nodes 12
active_primary_shards 434
active_shards 716
relocating_shards 9
initializing_shards 0
unassigned_shards 152
delayed_unassigned_shards 0
number_of_pending_tasks 70
number_of_in_flight_fetch 0
task_max_waiting_in_queue_millis 167
active_shards_percent_as_number 82.48847926267281

Checking the logs, i have found these issues:

Caused by: org.elasticsearch.transport.RemoteTransportException: [hostname6][10.240.36.150:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [4077609546/3.7gb], which is larger than the limit of [4013975142/3.7gb], real usage: [4077577728/3.7gb], new bytes reserved: [31818/31kb], usages [request=0/0b, fielddata=1626172697/1.5gb, in_flight_requests=32648/31.8kb, accounting=776288601/740.3mb]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.3.2.jar:7.3.2]

...

[2019-11-05T15:15:57,947][WARN ][o.e.i.c.IndicesClusterStateService] [hostnameB01] [clustername_metrics-raw_2019.09.13][0] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [clustername_metrics-raw_2019.09.13][0]: Recovery failed from {hostnameb6}{-TNL0xnsTpuWDzooT9bJtw}{D5PdSuGhQfy3ELk2a7_DZg}{10.240.36.150}{10.240.36.150:9300}{di}{ml.machine_memory=8356679680, ml.max_open_jobs=20, datacenter=b, xpack.installed=true} into {hostnameb1}{qOzCQhsPSD-MjRCyt8oSew}{gbiKUpwKS3iGxPtcRJFh8A}{10.240.36.145}{10.240.36.145:9300}{dim}{ml.machine_memory=8356687872, xpack.installed=true, ml.max_open_jobs=20, datacenter=b}

Could you help me solving this?
Thanks, Rodrigo.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.