Cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy

ES version - 6.x
ES cluster is red and getting the below error

sh-4.2# curl -k -XGET https://127.0.0.1:9200/_cluster/allocation/explain?pretty^M
{^H
  "index" : "index1",^M
  "shard" : 2,^M
  "primary" : true,^M
  "current_state" : "unassigned",^M
  "unassigned_info" : {^M
    "reason" : "ALLOCATION_FAILED",^M
    "at" : "2022-09-28T06:08:02.154Z",^M
    "failed_allocation_attempts" : 5,^M
    "details" : "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index1][2]: obtaining shard lock timed out after 5000ms]; ",^M
    "last_allocation_status" : "no"^M
  },^M
  "can_allocate" : "no",^M
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",^M
  "node_allocation_decisions" : [^M
    {^M
      "node_id" : "K_nIcsssdssRvQIagrsf2QLkfIQ",^M
      "node_name" : "K_nIcsR",^M
      "transport_address" : "localhost:9300",^M
      "node_decision" : "no",^M
      "store" : {^M
        "in_sync" : false,^M
        "allocation_id" : "PW4oAHGAT9KLvL24_GEjSQ"^M
      }^M
    },^M
    {^M
      "node_id" : "uOPt4GKBsfsfSsyuVLVu-IRZ-g",^M
      "node_name" : "uOPtsff4GK",^M
      "transport_address" : "localhost:9300",^M
      "node_decision" : "no",^M
      "store" : {^M
        "in_sync" : true,^M
        "allocation_id" : "sNdzxssfsfsTK4SV6PPz16z6gA4Q"^M
      },^M
      "deciders" : [^M
        {^M
          "decider" : "max_retry",^M
          "decision" : "NO",^M
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2022-09-28T06:08:02.154Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[ise][2]: obtaining shard lock timed out after 5000ms]; ], allocation_status[deciders_no]]]"^M
        }^M
      ]^M
    }^M
  ]^M```

Any reason what can be the cause for this issue, we are not able to reproduce this in all our setups, only some setups of them are having this issue.I saw couple forums which suggested to try reroute and increase max tries.  My question is once we set reroute to true and max retries to 15; will it the change be there always and when ever there is sync issue after 15 retries will reroute automatically happen beacuse I see that they are telling manually we have to do everytime. Please clarify this for me. Below is what I am planning to suggest.

curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed’
curl --silent --request PUT --header 'Content-Type: application/json' 127.0.0.1:9200/ise/_settings?pretty=true --data-ascii '{
"index": {
"allocation": {
"max_retries": 15
}
}
}'

Thanks

What is the output from the _cluster/stats?pretty&human API?

Please upgrade, 6.X is very much past EOL and no longer supported.

Below is the stats, yes we are planning to upgrade in the next release but we have customers who are still in the older verions and we support those versions. Can you please check if you can help anything here

curl -k -XGET 'https://localhost:9200/_cluster/stats?pretty'
{
  "_nodes" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "cluster_name" : "ise-elasticsearch",
  "timestamp" : 1667499489816,
  "status" : "green",
  "indices" : {
    "count" : 56,
    "shards" : {
      "total" : 560,
      "primaries" : 280,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 10,
          "max" : 10,
          "avg" : 10.0
        },
        "primaries" : {
          "min" : 5,
          "max" : 5,
          "avg" : 5.0
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 15547,
      "deleted" : 2
    },
    "store" : {
      "size_in_bytes" : 14312098,
      "throttle_time_in_millis" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 1016360,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 0,
      "total_count" : 0,
      "hit_count" : 0,
      "miss_count" : 0,
      "cache_size" : 0,
      "cache_count" : 0,
      "evictions" : 0
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 48,
      "memory_in_bytes" : 549044,
      "terms_memory_in_bytes" : 430552,
      "stored_fields_memory_in_bytes" : 20384,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 88448,
      "points_memory_in_bytes" : 84,
      "doc_values_memory_in_bytes" : 9576,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 6000,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 2,
      "data" : 2,
      "coordinating_only" : 0,
      "master" : 2,
      "ingest" : 2
    },
    "versions" : [
      "5.5.2"
    ],
    "os" : {
      "available_processors" : 2,
      "allocated_processors" : 2,
      "names" : [
        {
          "name" : "Linux",
          "count" : 2
        }
      ],
      "mem" : {
        "total_in_bytes" : 33275101184,
        "free_in_bytes" : 1456717824,
        "used_in_bytes" : 31818383360,
        "free_percent" : 4,
        "used_percent" : 96
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 726,
        "max" : 728,
        "avg" : 727
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 2594825923,
      "versions" : [
        {
          "version" : "1.8.0_292",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "25.292-b10",
          "vm_vendor" : "Red Hat, Inc.",
          "count" : 2
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 1019559888,
        "heap_max_in_bytes" : 2130051072
      },
      "threads" : 61
    },
    "fs" : {
      "total_in_bytes" : 1204824801280,
      "free_in_bytes" : 1111064612864,
      "available_in_bytes" : 1049815617536,
      "spins" : "true"
    },
    "plugins" : [
      {
        "name" : "SSLPlugin",
        "version" : "1.0",
        "description" : "SSL Plugin desc",
        "classname" : "org.elasticsearch.plugin.ssl.SSLPlugin",
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "nodeTransportModule" : 2
      },
      "http_types" : {
        "httpServerModule" : 2
      }
    }
  }
}```

This is a positively ancient version that has been EOL for years. You will struggle to get any support sorry to say.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.