An assigned shards -- max retries exceeded

Running on 7.10

I have several indexes which have unassigned replica shards (although) the primaries are OK. For example:

{
  "index" : "authm-000005",
  "shard" : 1,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2021-02-15T09:19:47.045Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [6UDagJW2T3eWM-0PQJ0rMA]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[authm-000005][1]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [955200ms]]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",

All nodes give the same reason for blocking allocation.

      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-02-15T09:19:47.045Z], failed_attempts[5], failed_nodes[[6UDagJW2T3eWM-0PQJ0rMA]], delayed=false, details[failed shard on node [6UDagJW2T3eWM-0PQJ0rMA]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[authm-000005][1]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [955200ms]]; ], allocation_status[no_attempt]]]"

I understand that some manual intervention is needed to break the deadlock but I can't figure out what I need to do. I have been trying various reroute commands but not getting anywhere.

Well by the time I had finished typing it I stumbled across the answer. I tried the "allocate_replica" command of reroute and it helpfully told me I needed to do _cluster/reroute?retry_failed=true several hundreds of lines of output to the terminal (thank heavens I did not ask for pretty!) the cluster immediately started allocating shards again and 5 minutes later it turned GREEN

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.