Elasticsearch - Snapshot having unassigned shards and Repository verification exception

Hi,
I am working on backing up my indexes present in my cluster. I have a dedicated master node and two data nodes. My configuration file in all nodes contains " path.repo: ["/u01/backup"] ".
Firstly, I created a repository using the API :
PUT - http://10.50.1.102:9999/_snapshot/firstbackup

{	"indices": "index1",
    "type": "fs",   "settings": { "location": "backup",  "compress": true   }  }  

For which my response was :

  {"error": {"root_cause": [ {"type": "repository_verification_exception",
                    "reason": "[firstbackup] [[fjUpFsbyRrSeN4I18Tmewg, 'RemoteTransportException[[search_slave2][10.50.1.100:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave2}{fjUpFsbyRrSeN4I18Tmewg}{wQHn0uN6QdOppX2yE5GQlQ}{10.50.1.100}{10.50.1.100:9300}{ml.machine_memory=16656232448, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [vmNm9LzsSpGo_-mYZtPe5w, 'RemoteTransportException[[search_slave1][10.50.1.101:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave1}{vmNm9LzsSpGo_-mYZtPe5w}{loRHnkpNTBGJPb4Xsx_vrQ}{10.50.1.101}{10.50.1.101:9300}{ml.machine_memory=16656236544, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"}   ],
            "type": "repository_verification_exception",
            "reason": "[firstbackup] [[fjUpFsbyRrSeN4I18Tmewg, 'RemoteTransportException[[search_slave2][10.50.1.100:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave2}{fjUpFsbyRrSeN4I18Tmewg}{wQHn0uN6QdOppX2yE5GQlQ}{10.50.1.100}{10.50.1.100:9300}{ml.machine_memory=16656232448, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [vmNm9LzsSpGo_-mYZtPe5w, 'RemoteTransportException[[search_slave1][10.50.1.101:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[firstbackup] a file written by master to the store [/u01/backup/backup] cannot be accessed on the node [{search_slave1}{vmNm9LzsSpGo_-mYZtPe5w}{loRHnkpNTBGJPb4Xsx_vrQ}{10.50.1.101}{10.50.1.101:9300}{ml.machine_memory=16656236544, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]. This might indicate that the store [/u01/backup/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"  },    "status": 500 }

Although it threw an exception here, i could see a new file generated at the specified location.
Next, i created a new snapshot using the API : PUT http://10.50.1.102:9999/_snapshot/firstbackup/snapshot_1?wait_for_completion=true

{ "indices": "index1",
  "ignore_unavailable": true,
  "include_global_state": false  }

The response for this is :

  {  "snapshot": {     "snapshot": "snapshot_1",  "uuid": "B60-2eQbRPC42B3IczSrfA", on_id": 6040099, "version": "6.4.0",  "indices": [  "index1"  ],  "include_global_state": false,  "state": "SUCCESS",
            "start_time": "2019-01-10T13:38:47.963Z",
            "start_time_in_millis": 1547127527963,
            "end_time": "2019-01-10T13:38:48.007Z",
            "end_time_in_millis": 1547127528007,
            "duration_in_millis": 44,
            "failures": [],   "shards": {  "total": 5,  "failed": 0,"successful": 5 }} }

Now, I delete the index from the cluster.
Next, I try to restore the deleted index using the API: POST - http://10.50.1.102:9999/_snapshot/firstbackup/snapshot_1/_restore for which i received the response

{   "accepted": true }

Now, My cluster health shows red status as both primary and replica shards of this index are unassigned
When I run the API- http://10.50.1.102:9999/_cluster/allocation/explain?pretty

{ "index": "index1",
    "shard": 4,
    "primary": false,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "NEW_INDEX_RESTORED",
        "at": "2019-01-10T13:12:06.536Z",
        "details": "restore_source[firstbackup/snapshot_1]",
        "last_allocation_status": "no_attempt"   },
    "can_allocate": "no",
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "node_allocation_decisions": [   {
            "node_id": "fjUpFsbyRrSeN4I18Tmewg",
            "node_name": "search_slave2",
            "transport_address": "10.50.1.120:9300",
            "node_attributes": {
                "ml.machine_memory": "16656232448",
                "ml.max_open_jobs": "20",
                "xpack.installed": "true",
                "ml.enabled": "true"     },
            "node_decision": "no",
            "deciders": [{
                    "decider": "replica_after_primary_active",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active",
                { "decider": "throttling",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active" }  },
        { "node_id": "vmNm9LzsSpGo_-mYZtPe5w",
            "node_name": "search_slave1",
            "transport_address": "10.50.1.121:9300",
            "node_attributes": {
                "ml.machine_memory": "16656236544",
                "ml.max_open_jobs": "20",
                "xpack.installed": "true",
                "ml.enabled": "true" },
            "node_decision": "no",
            "deciders": [ {
                    "decider": "replica_after_primary_active",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active"    },
                { "decider": "throttling",
                    "decision": "NO",
                    "explanation": "primary shard for this replica is not yet active" }  ] }  ] }

Am I missing something here? Is there a configuration that I have missed?

Does the repository path point to a shared filesystem that is accessible by all nodes? Note that this can not point to paths in the local file systems.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.