Shards remain UNASSIGNED after _restore operation


(Jonathan Spooner) #1

When running this _restore command not all of my indexes are restored and this command never returns.

curl -s -XPOST 'localhost:9200/_snapshot/s3_repository/2017-02-10-17/_restore?wait_for_completion=true-d' '{
  "ignore_unavailable": true,
  "include_global_state": false
}'

You can run _cluster/allocation/explain API to see why things are screwed up.

Q1: What the heck is "last_allocation_status" : "no_attempt"?
Q1.1: "cannot allocate because allocation is not permitted to any of the nodes" Why is allocation not permitted on any nodes? Other indexes restored with no issues.

curl -s -XGET "localhost:9200/_cluster/allocation/explain?pretty"
{
  "index" : "sightings-geohex-2016-01-04",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NEW_INDEX_RESTORED",
    "at" : "2017-02-15T18:21:43.960Z",
    "details" : "restore_source[s3_repository/2017-02-10]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "1wXza4AESD-SO4S4kWHQNA",
      "node_name" : "1wXza4A",
      "transport_address" : "10.148.14.172:9300",
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "replica_after_primary_active",
          "decision" : "NO",
          "explanation" : "primary shard for this replica is not yet active"
        },
        {
          "decider" : "throttling",
          "decision" : "NO",
          "explanation" : "primary shard for this replica is not yet active"
        }
      ]
    }
      ]
    }
  ]
}

Let's look at the primary node specifically

curl -XGET "$INSTANCE_IP:9200/_cluster/allocation/explain?pretty" -d '{
"index": "sightings-geohex-2016-01-01",
"shard": 0,
"primary": true
}'

Q: Why does the backup look for this _id?
Q: Is the _id different than the node.name?

{
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "can_allocate": "no",
    "current_state": "unassigned",
    "index": "sightings-geohex-2016-01-01",
    "node_allocation_decisions": [
        {
            "deciders": [
                {
                    "decider": "filter",
                    "decision": "NO",
                    "explanation": "initial allocation of the index is only allowed on nodes [_id:\"HUbsbDLGRrWwoQtKlXP3Vw\"]"
                }
            ],
            "node_decision": "no",
            "node_id": "2LS9wxZuS72Wt_lbUybEIw",
            "node_name": "2LS9wxZ",
            "transport_address": "10.91.139.169:9300",
            "weight_ranking": 1
        },
        {
            "deciders": [
                {
                    "decider": "filter",
                    "decision": "NO",
                    "explanation": "initial allocation of the index is only allowed on nodes [_id:\"HUbsbDLGRrWwoQtKlXP3Vw\"]"
                }
            ],
            "node_decision": "no",
            "node_id": "ZRduycOURwyyGn1SZrd72Q",
            "node_name": "ZRduycO",
            "transport_address": "10.61.190.175:9300",
            "weight_ranking": 2
        },
        {
            "deciders": [
                {
                    "decider": "filter",
                    "decision": "NO",
                    "explanation": "initial allocation of the index is only allowed on nodes [_id:\"HUbsbDLGRrWwoQtKlXP3Vw\"]"
                }
            ],
            "node_decision": "no",
            "node_id": "cbs3JIE5T4e3b1kr_3wCAg",
            "node_name": "cbs3JIE",
            "transport_address": "10.164.223.27:9300",
            "weight_ranking": 3
        },
        {
            "deciders": [
                {
                    "decider": "filter",
                    "decision": "NO",
                    "explanation": "initial allocation of the index is only allowed on nodes [_id:\"HUbsbDLGRrWwoQtKlXP3Vw\"]"
                }
            ],
            "node_decision": "no",
            "node_id": "LKi-FBxBRcOqrnsZs_n4Fw",
            "node_name": "HUbsbDLGRrWwoQtKlXP3Vw",
            "transport_address": "10.170.35.169:9300",
            "weight_ranking": 4
        }
    ],
    "primary": true,
    "shard": 0,
    "unassigned_info": {
        "at": "2017-02-15T18:21:43.960Z",
        "details": "restore_source[s3_repository/2017-02-10-17:36:49]",
        "last_allocation_status": "no",
        "reason": "NEW_INDEX_RESTORED"
    }
}

Now taking a loot at the Index Metadata.

Q2: What is allocation.initial_recovery? I can't find it in any docs. Is this looking for a node with that name?
Q3: This index was created as a result of a _shrink operation. The shrink source index was deleted before taking the _snapshot. Routing was set on the source index to "index.routing.allocation.require._name": "shrink_node_name". However the new index automatically moved to other data nodes before the _snapshot was taken.

{
	"state": "open",
	"settings": {
		"index": {
			"routing": {
				"allocation": {
					"initial_recovery": {
						"_id": "HUbsbDLGRrWwoQtKlXP3Vw"
					}
				}
			},
			"allocation": {
				"max_retries": "1"
			},
			"number_of_shards": "1",
			"shrink": {
				"source": {
					"name": "bulk-sightings-geohex-2016-01-04",
					"uuid": "Y3tQt30zRTKUiQvG39VtyA"
				}
			},
			"provided_name": "sightings-geohex-2016-01-04",
			"creation_date": "1486754454536",
			"number_of_replicas": "1",
			"uuid": "TDvEhpYeTmulcHYgSa0coQ",
			"version": {
				"created": "5020099"
			}
		}
	},
	"mappings": {
		"sighting": {
			"properties": {
				SNIP
			}
		}
	},
	"aliases": [],
	"primary_terms": {
		"0": 1
	},
	"in_sync_allocations": {
		"0": [
			"1HL22qjlTxyi1IllOqACVQ",
			"EZ__d7liTaaTeO2Z6BlJUA"
		]
	}
}

Q4: How do I get the _restore operation to finish correctly?


(Jonathan Spooner) #2

After a few hours of debugging I narrowed the issue down to an issue with _id. I moved the issue over to a new question


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.