Search operation fails after restoring a snapshot

Background -
Using v7.3.2 on Windows 10 environment.
(Had to downgrade version since the snapshot that I am interested in, was created using V7.3.2)

I have restored a snapshot that was provided from another V7.3.2 environment by using three steps -

  1. Include path.repo in yml file
  2. Register the snapshot - return shows success
  3. Restore the snapshot - return shows success

After that,
I can list properties by using the following command (xyz=index name)

GET /xyz

But the following command fails -

GET /xyz/_search

This is the error I get -

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": []
  },
  "status": 503
}

Can you please check and advise if I am missing any other step post-restore?

Thank you

Hello @debasisc,

  • Do you see any Elasticsearch log with additional information? You should see a stacktrace
  • What is the cluster health?
  • What is the output of GET /_cluster/allocation/explain?pretty ?

Thanks @Luca_Belluccini for prompt response.

  1. See error in log file. I replaced index name by "xyz"

  1. Response to command - GET _cluster/health/xyz
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 28.57142857142857
}
  1. Response to command GET /_cluster/allocation/explain?pretty
{
  "index" : "xyz",
  "shard" : 3,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2020-04-07T21:03:44.885Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
  "node_allocation_decisions" : [
    {
      "node_id" : "rirDLDJNTn6LhEXDWpxd-w",
      "node_name" : "CDDMLW008",
      "transport_address" : "127.0.0.1:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8467283968",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    }
  ]
}

Please check and advise.

Thank you

Hello, I have some conflicting feedback on what you got.
The shard 3p of your index cannot be found.

  • Did you downgrade Elasticsearch reusing the same hosts or you put in place a new cluster?
  • What is the output of GET /_snapshot/<name of your repo>/<name of the snapshot>/_status
  • If the index already existed in your new cluster, the index can only be restored if it has the same number of shards and the index is closed.
  1. I am using the same Windows machine to use V7.3.2. Did not find any way to deinstall V7.5.2. So, after downloading V7.3.2, I am simply starting the BAT file from new location to fire up V7.3.2.

Do I need to anything different?

  1. Output of GET /_snapshot/myrepo/myindex/_status
{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_missing_exception",
        "reason" : "[myrepo:myindex] is missing"
      }
    ],
    "type" : "snapshot_missing_exception",
    "reason" : "[myrepo:myindex] is missing"
  },
  "status" : 404
}

This leads me to think that something was missed during restoring the snapshot.

Let me repeat what I did.

2A, Placed the repo directory folder in Windows. C:\Elasticsearch\myrepo

2B. Entered the following line in YML file. (escape character for backslash)

path.repo: ["C:\\Elasticsearch\\myrepo"]

2C. Then registered using this command -

PUT /_snapshot/myrepo
{
  "type" : "fs",
  "settings" : {
    "location" : "C:\\Elasticsearch\\myrepo"
  }
}

It did return success.

2D. Then restored using this command -

POST /_snapshot/myrepo/snapshot_1/_restore

This also returned success.

2E. Then I tried checking contents by this command -

GET /myindex

Shows me proper high level listing

Can you please check and tell me what was missed during the restoration?

Thank you

Also tried this command
GET /_cluster/health/myindex?level=shards

This returns -

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 28.57142857142857,
  "indices" : {
    "myindex" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 0,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 5,
      "shards" : {
        "0" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "1" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "2" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "3" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        },
        "4" : {
          "status" : "red",
          "primary_active" : false,
          "active_shards" : 0,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 1
        }
      }
    }
  }
}
  1. The step is correct if your 7.3.2 installation is not re-using the data path of your previous 7.5.2 installation
  2. I think you had a typo when you've run GET /_snapshot/myrepo/myindex/_status (it seems you left an extra >
    2A. You copied the files from another cluster? Please be aware a snapshot repository on filesystem must be a shared directory shared across all the nodes of the cluster.
    2B. Correct
    2C. Correct, but I would add "readonly": true to the repo settings
    2D. With this request, you're restoring ALL the indices in the snapshot.
    If an index with the same name of the index you want to restore already exist on the cluster, it must have the same number of shards and be closed before restoring.
    If you want to restore a single index, you should use something like:
    POST /_snapshot/myrepo/snapshot_1/_restore
    {
      "indices": "the name of your index",
    }
    
    2E. This is normal

Step-2 : I had a typo in the message as I replaced actual repo name with "myrepo". But in reality, the error exists and I rechecked my input in Kibana console. Now, I corrected my previous message.

Step-2B : Can you please provide full setting of path.repo that I need to insert in YML file?

Step-2D : Can you please confirm if use of "snapshot_1" is correct? I took that from elasticsearch online guide. But I am not 100% sure of this part.

I can try adding index name specifically. Will do that after I receive response about step-2C (path.repo settings).

However note that V7.3.2 is fresh installation and I have done no other work in this setup, apart from trying to restore a snapshot which came from external unix setup elsewhere.

Thanks for your help.

Response to question-3 : The V7.3.2 installation is fresh and the index does not exist. I have done no other work in V7.3.2 apart from trying to restore the snapshot which is received from external source (they are also running V7.3.2).

Step-1 : Where do I confirm "data path"?

For V7.5.2, I was using
C:\Elasticsearch\elasticsearch-7.5.2\bin\elasticsearch.bat to start

After installing V7.3.2, I am now using
C:\Elasticsearch\elasticsearch-7.3.2\bin\elasticsearch.bat to start

The old directory path is now renamed as

01/26/2020  06:22 PM    <DIR>          elasticsearch-7.5.2-do-not-use

Do I need to do anything different in order to make clean switch-over?

You have to run GET /_snapshot/myrepo/snapshot_1/_status
The myrepo must be the name of the repository you've setup.
The snapshot_1 must be the name of the snapshot you want to restore.
You can list available snapshots using GET /_cat/snapshots/myrepo (replace myrepo by the name of your actual repository).

Confirmed that the name of snapshot is snapshot_1.

GET /_cat/snapshots/myrepo

snapshot_1 SUCCESS 1578424368 19:12:48 1578424368 19:12:48 467ms 1 5 0 5

Also this is what I get for snapshot status

GET /_snapshot/myrepo/snapshot_1/_status

returns

{
  "snapshots" : [
    {
      "snapshot" : "snapshot_1",
      "repository" : "myrepo",
      "uuid" : "sGSyKr3VSf69wT0qx8gjGw",
      "state" : "SUCCESS",
      "include_global_state" : true,
      "shards_stats" : {
        "initializing" : 0,
        "started" : 0,
        "finalizing" : 0,
        "done" : 5,
        "failed" : 0,
        "total" : 5
      },
      "stats" : {
        "incremental" : {
          "file_count" : 20,
          "size_in_bytes" : 2176529
        },
        "total" : {
          "file_count" : 20,
          "size_in_bytes" : 2176529
        },
        "start_time_in_millis" : 1578424368550,
        "time_in_millis" : 408
      },
      "indices" : {
        "myindex" : {
          "shards_stats" : {
            "initializing" : 0,
            "started" : 0,
            "finalizing" : 0,
            "done" : 5,
            "failed" : 0,
            "total" : 5
          },
          "stats" : {
            "incremental" : {
              "file_count" : 20,
              "size_in_bytes" : 2176529
            },
            "total" : {
              "file_count" : 20,
              "size_in_bytes" : 2176529
            },
            "start_time_in_millis" : 1578424368550,
            "time_in_millis" : 408
          },
          "shards" : {
            "0" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 562298
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 562298
                },
                "start_time_in_millis" : 1578424368620,
                "time_in_millis" : 88
              }
            },
            "1" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 244237
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 244237
                },
                "start_time_in_millis" : 1578424368550,
                "time_in_millis" : 56
              }
            },
            "2" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 246933
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 246933
                },
                "start_time_in_millis" : 1578424368801,
                "time_in_millis" : 49
              }
            },
            "3" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 292853
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 292853
                },
                "start_time_in_millis" : 1578424368719,
                "time_in_millis" : 54
              }
            },
            "4" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 4,
                  "size_in_bytes" : 830208
                },
                "total" : {
                  "file_count" : 4,
                  "size_in_bytes" : 830208
                },
                "start_time_in_millis" : 1578424368860,
                "time_in_millis" : 98
              }
            }
          }
        }
      }
    }
  ]
}

I think you can perform the following operations:

  1. Delete the index you've recovered
    DELETE myindex
    
  2. Ensure the cluster is in good health (should return green)
    GET _cat/health?v
    
  3. Restore the index without cluster state
    POST /_snapshot/myrepo/snapshot_1/_restore
    {
      "indices": "myindex",
      "ignore_unavailable": true,
      "include_global_state": false
    }
    

Let me know if this results in the correct restore of the index.

For confidentiality reasons, I changed both repo name and index name in my capture below.
Thanks

Strange that DELETE now fails, although snapshot status showed me that exact index name.

DELETE myindex

returns

{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [myindex]",
        "index_uuid" : "jHcq0N-tSxaH_BqrWbSJWQ",
        "index" : "myindex"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index [myindex]",
    "index_uuid" : "jHcq0N-tSxaH_BqrWbSJWQ",
    "index" : "myindex"
  },
  "status" : 404
}

Hello @debasisc

I've modified my answer too, for what it's worth.

When we perform a DELETE <index name> we're not interacting with snapshots.
We're deleting an index from the cluster.

My objective is:

  • Delete any index on the cluster which has the same name of the index you want to restore which is inside the snapshot_1 (doc)
  • Restore the index using the request suggested in my previous message, specifying the name of the index to be restored and disabling the global state restore (doc)

If you have an official support contract (I see you have X-Pack installed, but it might be a trial), I might suggest to open a support case.

Thanks very much for changing index name and also for your patient help.
I simply installed free version from elastic site myself and do not have access to support.

What's the best way to restart everything from scratch so that I can follow your command for restore?

What if I stop running elasticsearch/kibana, delete existing V7.3.2 folders from C:\Elasticsearch, and re-deploy from zip files? Is that proper way to 'start fresh'?

Thanks again

@Luca_Belluccini - Can you also please suggest correct syntax for path.repo in YML with readonly option? Thank you

Currently, I have this -

path.repo: ["C:\\Elasticsearch\\myrepo"]

This is correct.


To register a read only repo, use the request shared above

Thanks @Luca_Belluccini for suggesting how to make the repo "read only". As I am having several issues (including being unable to delete the index), I am planning to 'reset' and redo the deployment from scratch. Thanks to all various checks (health check) that you suggested, I'll check status at each and every milestone. We'll see how that works and if that allows me to work with the index and start to retrieve actual data. Thanks again.

No problem, please record the requests you submit and the responses in case you end up in a problematic situation.