Unassigned Shards with allocation_status deciders_no


(Jonathan Spooner) #1

I'm restoring my data from S3 to a different cluster and some of my indexes are stuck in the UNASSIGNED state with an allocation_status of deciders_no.

1. Verify Snapshot

It appears to be valid and some of the indexes do restore correctly.

GET /_snapshot/s3_repository/for_production_4-21e/

{
  "snapshots": [
    {
      "snapshot": "4-21e",
      "uuid": "4s2-PIUaRHKtFUeOKvrZhw",
      "version_id": 5020299,
      "version": "5.2.2",
      "indices": [
        "sightings-2016-02-01",
        "places"
      ],
      "state": "SUCCESS",
      "start_time": "2017-04-21T15:57:54.189Z",
      "start_time_in_millis": 1492790274189,
      "end_time": "2017-04-21T16:01:17.127Z",
      "end_time_in_millis": 1492790477127,
      "duration_in_millis": 202938,
      "failures": [],
      "shards": {
        "total": 395,
        "failed": 0,
        "successful": 395
      }
    }
  ]
}

2. Restoring from S3 to a different cluster

POST /_snapshot/s3_repository/for_production_4-21e/_restore?wait_for_completion=false
{
  "indices": "sightings-2016-02-01,places",
  "ignore_unavailable": true,
  "include_global_state": true,
  "index_settings": {
    "number_of_replicas": 0
  }
}

But the master shard for this index is stuck in the unassigned state. The

{
  "state": "UNASSIGNED",
  "primary": true,
  "node": null,
  "relocating_node": null,
  "shard": 0,
  "index": "sightings-2016-02-01",
  "recovery_source": {
    "type": "SNAPSHOT",
    "repository": "s3_repository",
    "snapshot": "for_production_4-21e",
    "version": "5.2.2",
    "index": "sightings-2016-02-01"
  },
  "unassigned_info": {
    "reason": "NEW_INDEX_RESTORED",
    "at": "2017-04-21T16:51:26.260Z",
    "delayed": false,
    "details": "restore_source[s3_repository/for_production_4-21e]",
    "allocation_status": "deciders_no"
  }
}

GET /_cluster/allocation/explain

Looking at this route shows the index is looking for a node with the _id of nSGRGqb-RUmgQkW-jbZ7QA OR 82GQkoQ5Q5yftFg_g4Qpvg. But how do we fix this?

"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "voWCoe9BQuiYxNu-jt1c2A",
      "node_name": "voWCoe9",
      "transport_address": "10.1.189.13:9300",
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": """initial allocation of the index is only allowed on nodes [_id:"nSGRGqb-RUmgQkW-jbZ7QA OR 82GQkoQ5Q5yftFg_g4Qpvg"]"""

(Nik Everett) #2

Does the restored index have any allocation filtering in its settings? It it does you can set the filtering to null or empty string.


(Jonathan Spooner) #3

Hi @nik9000,

I tried _name and _ip. I'm thinking these indexes are not restoring because they were created with the shrink api.

PUT sightings-2016-02-01/_settings
{
  "index.routing.allocation.include._ip": "10.*"
}

GET sightings-2016-02-01/_settings

{
  "sightings-2016-02-01": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_ip": "10.*"
            },
            "initial_recovery": {
              "_id": "nSGRGqb-RUmgQkW-jbZ7QA,82GQkoQ5Q5yftFg_g4Qpvg"
            }
          }
        },
        "allocation": {
          "max_retries": "1"
        },
        "number_of_shards": "1",
        "shrink": {
          "source": {
            "name": "bulk-sightings-2016-02-01",
            "uuid": "8arcuERNTlKYE9fEZ9ZNkQ"
          }
        },
        "provided_name": "sightings-2016-02-01",
        "creation_date": "1490916922072",
        "number_of_replicas": "0",
        "uuid": "hmnvNW-eQum3wihXbtRDhw",
        "version": {
          "created": "5020299",
          "upgraded": "5020299"
        }
      }
    }
  }
}

(Jonathan Spooner) #4

I think the problem is by allocation.initial_recovery._id because it was the same value on the previous cluster.

"sightings-2016-02-01": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "initial_recovery": {
              "_id": "nSGRGqb-RUmgQkW-jbZ7QA,82GQkoQ5Q5yftFg_g4Qpvg"

I've


(Jonathan Spooner) #5
GET /_cluster/allocation/explain
{
  "index": "sightings-2016-02-01",
  "shard": 0,
  "primary": true
}

Response

{
  "index": "sightings-2016-02-01",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NEW_INDEX_RESTORED",
    "at": "2017-04-21T17:48:23.476Z",
    "details": "restore_source[s3_repository/for_production_4-21e]",
    "last_allocation_status": "no"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "voWCoe9BQuiYxNu-jt1c2A",
      "node_name": "voWCoe9",
      "transport_address": "10.1.189.13:9300",
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": """initial allocation of the index is only allowed on nodes [_id:"nSGRGqb-RUmgQkW-jbZ7QA OR 82GQkoQ5Q5yftFg_g4Qpvg"]"""
        }
      ]
    },
    {
      "node_id": "ZNbmCrO6RoaVFQtsdm-R1w",
      "node_name": "ZNbmCrO",
      "transport_address": "10.1.190.226:9300",
      "node_decision": "no",
      "weight_ranking": 2,
      "deciders": [
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": """initial allocation of the index is only allowed on nodes [_id:"nSGRGqb-RUmgQkW-jbZ7QA OR 82GQkoQ5Q5yftFg_g4Qpvg"]"""
        }
      ]
    },
    {
      "node_id": "ngwroDfyR2urGlJ4UEZvEw",
      "node_name": "ngwroDf",
      "transport_address": "10.1.190.21:9300",
      "node_decision": "no",
      "weight_ranking": 3,
      "deciders": [
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": """initial allocation of the index is only allowed on nodes [_id:"nSGRGqb-RUmgQkW-jbZ7QA OR 82GQkoQ5Q5yftFg_g4Qpvg"]"""
        }
      ]
    },
    {
      "node_id": "u5tXF4qQQb2YZxp0hJsFFg",
      "node_name": "u5tXF4q",
      "transport_address": "10.1.189.248:9300",
      "node_decision": "no",
      "weight_ranking": 4,
      "deciders": [
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": """initial allocation of the index is only allowed on nodes [_id:"nSGRGqb-RUmgQkW-jbZ7QA OR 82GQkoQ5Q5yftFg_g4Qpvg"]"""
        }
      ]
    }
  ]
}

(Nik Everett) #6

Can you try setting that initial_recovery option to null? That should clear it.


(Jonathan Spooner) #7

No it returns an error. I've tried setting the parents objects to null as well.

PUT sightings-2016-02-01/_settings
{
  "index.routing.allocation.initial_recovery._id": null,
  "index.allocation.max_retries": 5
}

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[v_Rt1wM][10.1.189.53:9300][indices:admin/settings/update]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "unknown setting [index.routing.allocation.initial_recovery._id] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
  },
  "status": 400
}

(Nik Everett) #8

Let me see if I can reproduce locally. I'll play and get back to you.


(Jonathan Spooner) #9

Thanks! I found this issue for [initial_recover limits replica allocation.] (https://github.com/elastic/elasticsearch/pull/20589)

I guess I'm going to dig through the elasticsearch source code.


(Nik Everett) #10

OK - I just reproduced this locally against master. I'm fairly sure you can't restore shrunken indices now.


(Jonathan Spooner) #11

What are my options for getting this data from our QA cluster to production cluster? Reindexing isn't an options since we're in the 2TB range.


(Nik Everett) #12

I'm not sure! Still investigating. There may not be any good options.


(Jonathan Spooner) #13

Are you an elasticsearch employee? I can write an issue in GitHub if you have not already doing it.


(Nik Everett) #14

I'm doing it now, yeah. Just getting easy reproduction steps.


(Nik Everett) #15

(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.