Restoring to a different cluster with version 5.2.0


(Jonathan Spooner) #1

In the guides for restoring to a different cluster it says

If indices in the original cluster were assigned to particular nodes using shard allocation filtering, the same rules will be enforced in the new cluster. Therefore if the new cluster doesn’t contain nodes with appropriate attributes that a restored index can be allocated on, such index will not be successfully restored unless these index allocation settings are changed during restore operation.

In my situation I have an index that was created from the _shrink API. It's source index requires routing to a specific node so the shrink can happen. After the new index was created it automatically moved off of the node used to shrink. So I believe my new index does not contain any custom routing settings.

After deleting the original index I took a _snapshot. The next day I started up a new Elasticsearch cluster and attempt to _restore. Indices that are not products of the _shrink API restore immediately and the others sit in a red state.

When looking at the Index Metadata I see there is an allocation.initial_recover._id assigned. This is not something you can set in the config and is not the node.name.

{
	"state": "open",
	"settings": {
		"index": {
			"routing": {
				"allocation": {
					"initial_recovery": {
						"_id": "HUbsbDLGRrWwoQtKlXP3Vw"
					}
				}
			},
			"allocation": {
				"max_retries": "1"
			},
			"number_of_shards": "1",
			"shrink": {
				"source": {
					"name": "bulk-sightings-geohex-2016-01-04",
					"uuid": "Y3tQt30zRTKUiQvG39VtyA"
				}
			},
			"provided_name": "sightings-geohex-2016-01-04",
			"creation_date": "1486754454536",
			"number_of_replicas": "1",
			"uuid": "UwlPzI9sQQyRVKUijXF5nQ",
			"version": {
				"created": "5020099"
			}
		}
	},
	"in_sync_allocations": {
		"0": [
			"1HL22qjlTxyi1IllOqACVQ",
			"EZ__d7liTaaTeO2Z6BlJUA"
		]
	}
}

Using _cluster/allocation/explain API we can look at the primary shard.

curl -XGET "$INSTANCE_IP:9200/_cluster/allocation/explain?pretty" -d '{
  "index": "sightings-geohex-2016-01-01",
  "shard": 0,
  "primary": true
}'

Again we get confirmation on our _id problem.

initial allocation of the index is only allowed on nodes [_id:"HUbsbDLGRrWwoQtKlXP3Vw"

{
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "can_allocate": "no",
    "current_state": "unassigned",
    "index": "sightings-geohex-2016-01-01",
    "node_allocation_decisions": [
        {
            "deciders": [
                {
                    "decider": "filter",
                    "decision": "NO",
                    "explanation": "initial allocation of the index is only allowed on nodes [_id:\"HUbsbDLGRrWwoQtKlXP3Vw\"]"
                }
            ],
            "node_decision": "no",
            "node_id": "2LS9wxZuS72Wt_lbUybEIw",
            "node_name": "2LS9wxZ",
            "transport_address": "10.91.139.169:9300",
            "weight_ranking": 1
        },
        {
            "deciders": [
                {
                    "decider": "filter",
                    "decision": "NO",
                    "explanation": "initial allocation of the index is only allowed on nodes [_id:\"HUbsbDLGRrWwoQtKlXP3Vw\"]"
                }
            ],
            "node_decision": "no",
            "node_id": "ZRduycOURwyyGn1SZrd72Q",
            "node_name": "ZRduycO",
            "transport_address": "10.61.190.175:9300",
            "weight_ranking": 2
        }
    ],
    "primary": true,
    "shard": 0,
    "unassigned_info": {
        "at": "2017-02-15T18:21:43.960Z",
        "details": "restore_source[s3_repository/2017-02-10-17:36:49]",
        "last_allocation_status": "no",
        "reason": "NEW_INDEX_RESTORED"
    }
}

Ok let's review a few claims from the original statement

If indices in the original cluster were assigned to particular nodes using shard allocation filtering, the same rules will be enforced in the new cluster.

The index created from the _shrink API does not have any routing assigned to it.

index will not be successfully restored unless these index allocation settings are changed during restore operation.

Ok, so our index doesn't have a custom _name it has a mysterious _id. Let's try to ignore that _id by adding it to ignore_index_settings and set a node name to something in our cluster. And let's change the number_of _replicas to we know these settings are being assigned.

curl -s -XPOST "$INSTANCE_IP:9200/_snapshot/s3_repository/2017-02-10-17:36:49/_restore?wait_for_completion=true" -d '{
  "indices": "sightings-geohex-2016-01-01",
  "ignore_unavailable": true,
  "include_global_state": false,
  "index_settings": {
    "number_of_replicas":2,
    "index.routing.allocation.require._name": "1wXza4A"
  },
  "ignore_index_settings": [
    "index.routing.allocation.require._id"
  ]
}'

And to my surprise. We have the same result, a red cluster.

So does anyone know what this mystery _id is?

And since _restore?wait_for_completion=true will never return should this exit with an error code vs hanging out forever?


Shards remain UNASSIGNED after _restore operation
(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.