Elasticsearch - Snapshot having unassigned shards and Repository verification exception


(Nikesh) #1

Hi,
I am working on backing up my indexes present in my cluster. I have a dedicated master node and two data nodes. My configuration file in all nodes contains " path.repo: ["/u01/backup"] ".
Firstly, I created a repository using the API :
PUT - http://10.50.1.102:9999/_snapshot/firstbackup

{	"indices": "index1",
    "type": "fs",   "settings": { "location": "backup",  "compress": true   }  }  

For which my response was :

  {"error": {"root_cause": [ {"type": "repository_verification_exception",
                    "reason": "[firstbackup] [[fjUpFsbyRrSeN4I18Tmewg, 'RemoteTransportException[[search_slave2][10.50.1.100:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave2}{fjUpFsbyRrSeN4I18Tmewg}{wQHn0uN6QdOppX2yE5GQlQ}{10.50.1.100}{10.50.1.100:9300}{ml.machine_memory=16656232448, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [vmNm9LzsSpGo_-mYZtPe5w, 'RemoteTransportException[[search_slave1][10.50.1.101:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave1}{vmNm9LzsSpGo_-mYZtPe5w}{loRHnkpNTBGJPb4Xsx_vrQ}{10.50.1.101}{10.50.1.101:9300}{ml.machine_memory=16656236544, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"}   ],
            "type": "repository_verification_exception",
            "reason": "[firstbackup] [[fjUpFsbyRrSeN4I18Tmewg, 'RemoteTransportException[[search_slave2][10.50.1.100:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave2}{fjUpFsbyRrSeN4I18Tmewg}{wQHn0uN6QdOppX2yE5GQlQ}{10.50.1.100}{10.50.1.100:9300}{ml.machine_memory=16656232448, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [vmNm9LzsSpGo_-mYZtPe5w, 'RemoteTransportException[[search_slave1][10.50.1.101:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave1}{vmNm9LzsSpGo_-mYZtPe5w}{loRHnkpNTBGJPb4Xsx_vrQ}{10.50.1.101}{10.50.1.101:9300}{ml.machine_memory=16656236544, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"  },    "status": 500 }

Although it threw an exception here, i could see a new file generated at the specified location.
Next, i created a new snapshot using the API : PUT http://10.50.1.102:9999/_snapshot/firstbackup/snapshot_1?wait_for_completion=true

{ "indices": "index1",
  "ignore_unavailable": true,
  "include_global_state": false  }

The response for this is :

  {  "snapshot": {     "snapshot": "snapshot_1",  "uuid": "B60-2eQbRPC42B3IczSrfA", on_id": 6040099, "version": "6.4.0",  "indices": [  "index1"  ],  "include_global_state": false,  "state": "SUCCESS",
            "start_time": "2019-01-10T13:38:47.963Z",
            "start_time_in_millis": 1547127527963,
            "end_time": "2019-01-10T13:38:48.007Z",
            "end_time_in_millis": 1547127528007,
            "duration_in_millis": 44,
            "failures": [],   "shards": {  "total": 5,  "failed": 0,"successful": 5 }} }

Now, I delete the index from the cluster.
Next, I try to restore the deleted index using the API: POST - http://10.50.1.102:9999/_snapshot/firstbackup/snapshot_1/_restore for which i received the response

{   "accepted": true }

Now, My cluster health shows red status as both primary and replica shards of this index are unassigned
When I run the API- http://10.50.1.102:9999/_cluster/allocation/explain?pretty

{ "index": "index1",
    "shard": 4,
    "primary": false,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "NEW_INDEX_RESTORED",
        "at": "2019-01-10T13:12:06.536Z",
        "details": "restore_source[firstbackup/snapshot_1]",
        "last_allocation_status": "no_attempt"   },
    "can_allocate": "no",
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "node_allocation_decisions": [   {
            "node_id": "fjUpFsbyRrSeN4I18Tmewg",
            "node_name": "search_slave2",
            "transport_address": "10.50.1.120:9300",
            "node_attributes": {
                "ml.machine_memory": "16656232448",
                "ml.max_open_jobs": "20",
                "xpack.installed": "true",
                "ml.enabled": "true"     },
            "node_decision": "no",
            "deciders": [{
                    "decider": "replica_after_primary_active",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active",
                { "decider": "throttling",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active" }  },
        { "node_id": "vmNm9LzsSpGo_-mYZtPe5w",
            "node_name": "search_slave1",
            "transport_address": "10.50.1.121:9300",
            "node_attributes": {
                "ml.machine_memory": "16656236544",
                "ml.max_open_jobs": "20",
                "xpack.installed": "true",
                "ml.enabled": "true" },
            "node_decision": "no",
            "deciders": [ {
                    "decider": "replica_after_primary_active",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active"    },
                { "decider": "throttling",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active" }  ] }  ] }

Am I missing something here? Is there a configuration that I have missed?


(Christian Dahlqvist) #2

Does the repository path point to a shared filesystem that is accessible by all nodes? Note that this can not point to paths in the local file systems.


(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.