Hi @chaitra_hegde,
marking and sending shard failed due to [failed recovery]
is really two parts:
marking and sending shard failed
: this happens when a shard fails for some reason. Elasticsearch tells the master node that the shard failed and the master node takes appropriate action (allocating it somewhere else).
[failed recovery]
: this indicates that the failure of the shard happened while recovering (initializing) the shard. The recovery is when a replica shard is initialized by copying data over from the primary shard.
The particular trigger here was the circuit breaker exception. This happens when ES thinks that too much memory is in use, either for a specific subsystem or overall. In this case, it was the parent
breaker that triggered this, which is overall memory use. The mentioned fix could help here or it could be a legitimate memory overuse. See also this comment for a deeper breakdown of the circuit breaker message.