Unallocated shards on elasticsearch 5.2.2


(Peter) #1

Hello,

I have an index which has unassigned shards. The explain shows some weird data.

 curl -XGET 'localhost:19276/_cluster/allocation/explain?pretty' -H 'Content-Type: application/json' -d'
{
  "index": "616bf200-b814-4a8b-816e-a4405061e3b8",
  "shard": 1,
  "primary": true
}
'
{
  "index" : "616bf200-b814-4a8b-816e-a4405061e3b8",
  "shard" : 1,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2017-04-05T21:01:02.381Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[616bf200-b814-4a8b-816e-a4405061e3b8][1]: obtaining shard lock timed out after 5000ms]; ",
"last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
  "node_allocation_decisions" : [
    {
      "node_id" : "FV8SIN2sRf-gYrgba4DzuQ",
      "node_name" : "api.realty.ci-v5-client-cluster-1",
      "transport_address" : "10.0.0.5:19376",
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "gI0AijoaR36o-8_CWY028g",
      "node_name" : "api.realty.ci-v5-client-cluster-3",
      "transport_address" : "10.0.0.7:19376",
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "vqTVGjJxT2Wxnt4g0l9rmw",
      "node_name" : "api.realty.ci-v5-client-cluster-2",
      "transport_address" : "10.0.0.6:19376",
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "V8bd8J8kRZmlInMq_v2GCQ",
        "store_exception" : {
          "type" : "shard_lock_obtain_failed_exception",
          "reason" : "[616bf200-b814-4a8b-816e-a4405061e3b8][1]: obtaining shard lock timed out after 5000ms",
          "index_uuid" : "zqqOM8OXRuGFwLl6rxmYbw",
          "shard" : "1",
          "index" : "616bf200-b814-4a8b-816e-a4405061e3b8"
        }
       },
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2017-04-05T21:01:02.381Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[616bf200-b814-4a8b-816e-a4405061e3b8][1]: obtaining shard lock timed out after 5000ms]; ], allocation_status[deciders_no]]]"
        }
      ]
    }
  ]
}

any idea how to fix this?

Regards,


(Simon Willnauer) #2

in Elasticsearch 5 we added a retry threshold to stop trying to allocate a shard over and over again when it's literally not possible. You can see the reason here: [quote="zozo6015, post:1, topic:81519"]
"details" : "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[616bf200-b814-4a8b-816e-a4405061e3b8][1]: obtaining shard lock timed out after 5000ms]; ",
"last_allocation_status" : "no"
[/quote]

The shard allocator decides to not try anymore since the last 5 attempts failed: [quote="zozo6015, post:1, topic:81519"]
"explanation" : "shard has exceeded the maximum number of retries [5]
[/quote]

The number of times we retry depends on index.allocation.max_retries that is set to 5 initially. Now if you feel like the problem has been solved causing the allocation failure you can set to a higher number ie. 6 and see if it allocates. If not it will try until the number of failures you configured giving you time to fix the problem. Note, this setting should only be set for the index that has problems, see index settings update API

hope this helps


(Peter) #3

Thank your for your information. Did helped alot.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.