Search operation fails after restoring a snapshot

debasisc · April 7, 2020, 8:16pm

Background -
Using v7.3.2 on Windows 10 environment.
(Had to downgrade version since the snapshot that I am interested in, was created using V7.3.2)

I have restored a snapshot that was provided from another V7.3.2 environment by using three steps -

Include path.repo in yml file
Register the snapshot - return shows success
Restore the snapshot - return shows success

After that,
I can list properties by using the following command (xyz=index name)

GET /xyz

But the following command fails -

GET /xyz/_search

This is the error I get -

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": []
  },
  "status": 503
}

Can you please check and advise if I am missing any other step post-restore?

Thank you

Luca_Belluccini · April 7, 2020, 8:48pm

Hello @debasisc,

Do you see any Elasticsearch log with additional information? You should see a stacktrace
What is the cluster health?
What is the output of GET /_cluster/allocation/explain?pretty ?

debasisc · April 7, 2020, 9:28pm

Thanks @Luca_Belluccini for prompt response.

See error in log file. I replaced index name by "xyz"

Response to command - GET _cluster/health/xyz

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 28.57142857142857
}

Response to command GET /_cluster/allocation/explain?pretty

{
  "index" : "xyz",
  "shard" : 3,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2020-04-07T21:03:44.885Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
  "node_allocation_decisions" : [
    {
      "node_id" : "rirDLDJNTn6LhEXDWpxd-w",
      "node_name" : "CDDMLW008",
      "transport_address" : "127.0.0.1:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8467283968",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    }
  ]
}

Please check and advise.

Thank you

Luca_Belluccini · April 7, 2020, 10:37pm

Hello, I have some conflicting feedback on what you got.
The shard 3p of your index cannot be found.

Did you downgrade Elasticsearch reusing the same hosts or you put in place a new cluster?
What is the output of GET /_snapshot/<name of your repo>/<name of the snapshot>/_status
If the index already existed in your new cluster, the index can only be restored if it has the same number of shards and the index is closed.

debasisc · April 7, 2020, 11:43pm

I am using the same Windows machine to use V7.3.2. Did not find any way to deinstall V7.5.2. So, after downloading V7.3.2, I am simply starting the BAT file from new location to fire up V7.3.2.

Do I need to anything different?

Output of GET /_snapshot/myrepo/myindex/_status

{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_missing_exception",
        "reason" : "[myrepo:myindex] is missing"
      }
    ],
    "type" : "snapshot_missing_exception",
    "reason" : "[myrepo:myindex] is missing"
  },
  "status" : 404
}

This leads me to think that something was missed during restoring the snapshot.

Let me repeat what I did.

2A, Placed the repo directory folder in Windows. C:\Elasticsearch\myrepo

2B. Entered the following line in YML file. (escape character for backslash)

path.repo: ["C:\\Elasticsearch\\myrepo"]

2C. Then registered using this command -

PUT /_snapshot/myrepo
{
  "type" : "fs",
  "settings" : {
    "location" : "C:\\Elasticsearch\\myrepo"
  }
}

It did return success.

2D. Then restored using this command -

POST /_snapshot/myrepo/snapshot_1/_restore

This also returned success.

2E. Then I tried checking contents by this command -

GET /myindex

Shows me proper high level listing

Can you please check and tell me what was missed during the restoration?

Thank you

debasisc · April 8, 2020, 12:41am

Also tried this command
GET /_cluster/health/myindex?level=shards

This returns -

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 28.57142857142857,
  "indices" : {
    "myindex" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 0,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 5,
      "shards" : {
        "0" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "1" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "2" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "3" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "4" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        }
      }
    }
  }
}

Luca_Belluccini · April 8, 2020, 7:51am

The step is correct if your 7.3.2 installation is not re-using the data path of your previous 7.5.2 installation
I think you had a typo when you've run GET /_snapshot/myrepo/myindex/_status (it seems you left an extra >
2A. You copied the files from another cluster? Please be aware a snapshot repository on filesystem must be a shared directory shared across all the nodes of the cluster.
2B. Correct
2C. Correct, but I would add "readonly": true to the repo settings
2D. With this request, you're restoring ALL the indices in the snapshot.
If an index with the same name of the index you want to restore already exist on the cluster, it must have the same number of shards and be closed before restoring.
If you want to restore a single index, you should use something like:
```
POST /_snapshot/myrepo/snapshot_1/_restore
{
  "indices": "the name of your index",
}
```
2E. This is normal

debasisc · April 8, 2020, 12:06pm

Step-2 : I had a typo in the message as I replaced actual repo name with "myrepo". But in reality, the error exists and I rechecked my input in Kibana console. Now, I corrected my previous message.

Step-2B : Can you please provide full setting of path.repo that I need to insert in YML file?

Step-2D : Can you please confirm if use of "snapshot_1" is correct? I took that from elasticsearch online guide. But I am not 100% sure of this part.

I can try adding index name specifically. Will do that after I receive response about step-2C (path.repo settings).

However note that V7.3.2 is fresh installation and I have done no other work in this setup, apart from trying to restore a snapshot which came from external unix setup elsewhere.

Thanks for your help.

debasisc · April 8, 2020, 12:14pm

Response to question-3 : The V7.3.2 installation is fresh and the index does not exist. I have done no other work in V7.3.2 apart from trying to restore the snapshot which is received from external source (they are also running V7.3.2).

debasisc · April 8, 2020, 12:23pm

Step-1 : Where do I confirm "data path"?

For V7.5.2, I was using
C:\Elasticsearch\elasticsearch-7.5.2\bin\elasticsearch.bat to start

After installing V7.3.2, I am now using
C:\Elasticsearch\elasticsearch-7.3.2\bin\elasticsearch.bat to start

The old directory path is now renamed as

01/26/2020  06:22 PM    <DIR>          elasticsearch-7.5.2-do-not-use

Do I need to do anything different in order to make clean switch-over?

Luca_Belluccini · April 8, 2020, 6:57pm

You have to run GET /_snapshot/myrepo/snapshot_1/_status
The myrepo must be the name of the repository you've setup.
The snapshot_1 must be the name of the snapshot you want to restore.
You can list available snapshots using GET /_cat/snapshots/myrepo (replace myrepo by the name of your actual repository).

debasisc · April 8, 2020, 8:01pm

Confirmed that the name of snapshot is snapshot_1.

GET /_cat/snapshots/myrepo

snapshot_1 SUCCESS 1578424368 19:12:48 1578424368 19:12:48 467ms 1 5 0 5

Also this is what I get for snapshot status

GET /_snapshot/myrepo/snapshot_1/_status

returns

{
  "snapshots" : [
    {
      "snapshot" : "snapshot_1",
      "repository" : "myrepo",
      "uuid" : "sGSyKr3VSf69wT0qx8gjGw",
      "state" : "SUCCESS",
      "include_global_state" : true,
      "shards_stats" : {
        "initializing" : 0,
        "started" : 0,
        "finalizing" : 0,
        "done" : 5,
        "failed" : 0,
        "total" : 5
      },
      "stats" : {
        "incremental" : {
          "file_count" : 20,
          "size_in_bytes" : 2176529
        },
        "total" : {
          "file_count" : 20,
          "size_in_bytes" : 2176529
        },
        "start_time_in_millis" : 1578424368550,
        "time_in_millis" : 408
      },
      "indices" : {
        "myindex" : {
          "shards_stats" : {
            "initializing" : 0,
            "started" : 0,
            "finalizing" : 0,
            "done" : 5,
            "failed" : 0,
            "total" : 5
          },
          "stats" : {
            "incremental" : {
              "file_count" : 20,
              "size_in_bytes" : 2176529
            },
            "total" : {
              "file_count" : 20,
              "size_in_bytes" : 2176529
            },
            "start_time_in_millis" : 1578424368550,
            "time_in_millis" : 408
          },
          "shards" : {
            "0" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 562298
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 562298
                },
                "start_time_in_millis" : 1578424368620,
                "time_in_millis" : 88
              }
            },
            "1" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 244237
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 244237
                },
                "start_time_in_millis" : 1578424368550,
                "time_in_millis" : 56
              }
            },
            "2" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 246933
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 246933
                },
                "start_time_in_millis" : 1578424368801,
                "time_in_millis" : 49
              }
            },
            "3" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 292853
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 292853
                },
                "start_time_in_millis" : 1578424368719,
                "time_in_millis" : 54
              }
            },
            "4" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 830208
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 830208
                },
                "start_time_in_millis" : 1578424368860,
                "time_in_millis" : 98
              }
            }
          }
        }
      }
    }
  ]
}

Luca_Belluccini · April 8, 2020, 10:06pm

I think you can perform the following operations:

Delete the index you've recovered
```
DELETE myindex
```
Ensure the cluster is in good health (should return green)
```
GET _cat/health?v
```

Restore the index without cluster state

POST /_snapshot/myrepo/snapshot_1/_restore
{
  "indices": "myindex",
  "ignore_unavailable": true,
  "include_global_state": false
}

Let me know if this results in the correct restore of the index.

debasisc · April 8, 2020, 10:29pm

For confidentiality reasons, I changed both repo name and index name in my capture below.
Thanks

Strange that DELETE now fails, although snapshot status showed me that exact index name.

DELETE myindex

returns

{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [myindex]",
        "index_uuid" : "jHcq0N-tSxaH_BqrWbSJWQ",
        "index" : "myindex"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index [myindex]",
    "index_uuid" : "jHcq0N-tSxaH_BqrWbSJWQ",
    "index" : "myindex"
  },
  "status" : 404
}

Luca_Belluccini · April 8, 2020, 10:58pm

Hello @debasisc

I've modified my answer too, for what it's worth.

When we perform a DELETE <index name> we're not interacting with snapshots.
We're deleting an index from the cluster.

My objective is:

Delete any index on the cluster which has the same name of the index you want to restore which is inside the snapshot_1 (doc)
Restore the index using the request suggested in my previous message, specifying the name of the index to be restored and disabling the global state restore (doc)

If you have an official support contract (I see you have X-Pack installed, but it might be a trial), I might suggest to open a support case.

debasisc · April 8, 2020, 11:06pm

Thanks very much for changing index name and also for your patient help.
I simply installed free version from elastic site myself and do not have access to support.

What's the best way to restart everything from scratch so that I can follow your command for restore?

What if I stop running elasticsearch/kibana, delete existing V7.3.2 folders from C:\Elasticsearch, and re-deploy from zip files? Is that proper way to 'start fresh'?

Thanks again

debasisc · April 8, 2020, 11:18pm

@Luca_Belluccini - Can you also please suggest correct syntax for path.repo in YML with readonly option? Thank you

Currently, I have this -

path.repo: ["C:\\Elasticsearch\\myrepo"]

Luca_Belluccini · April 9, 2020, 7:39am

This is correct.

To register a read only repo, use the request shared above

debasisc · April 9, 2020, 1:57pm

Thanks @Luca_Belluccini for suggesting how to make the repo "read only". As I am having several issues (including being unable to delete the index), I am planning to 'reset' and redo the deployment from scratch. Thanks to all various checks (health check) that you suggested, I'll check status at each and every milestone. We'll see how that works and if that allows me to work with the index and start to retrieve actual data. Thanks again.

Luca_Belluccini · April 9, 2020, 7:55pm

No problem, please record the requests you submit and the responses in case you end up in a problematic situation.

Topic		Replies	Views
Details on snapshot and restore in ES 1.0 Elasticsearch	16	527	July 6, 2017
failed to restore snapshot - IndexShardRestoreFailedException file not found Elasticsearch	2	1467	August 28, 2014
Relation between snapshot restore and update_mapping Elasticsearch	2	414	July 6, 2017
Index copy via snapshot & restore not working as expected Elasticsearch	1	440	July 6, 2017
Restoring a snapshot from one machine to a different machine Elasticsearch	5	4819	July 6, 2017

Search operation fails after restoring a snapshot

Related topics