Restore from snapshot fails with no recovery information

bill_d · August 3, 2021, 5:45pm

I have 2 similarly configured 3 node elasticsearch clusters. Both run 7.10.2, but one is bigger than the other.

I'm trying to restore an index from a snapshot. In the small cluster, the restore takes about 3 minutes, during which the cluster is yellow. In the big cluster, starting the restore causes the cluster to go red. I waited some time, but had to delete the restored index while it was red so the cluster would be up.

Both clusters have a repository with scheduled backups.
GET /_cat/snapshots/gcs_repository?v
Shows 4 snapshots, all have status = "SUCCESS".
To give an idea of the size, the small cluster has duration between 1-2 minutes. The large cluster has 4-12 minutes.

POST /_snapshot/gcs_repository/_verify
Shows 3 nodes.

GET /_snapshot/gcs_repository/
Shows the repository, the only difference here is that the large cluster has "max_snapshot_bytes_per_sec" : "320mb".

When I start the restore, I issue a command like:

POST /_snapshot/gcs_repository/daily-backup-0/_restore
{
  "indices": "foo-000001",
  "rename_pattern": "foo-000001",
  "rename_replacement": "foo-test0"
}

Both clusters respond with "accepted".
The small cluster's health goes to yellow, because there is a new index foo-test0 that is yellow. The large cluster goes red, as is the new index.

Then I check the status with
GET /foo-test0/_recovery
And the small cluster responds with a bunch of details about the recovery process, including the percent done. Awesome.

But the large cluster responds with
{ }

Since the large cluster is being used, I can't leave it red for long, so i delete the new index and it goes back to green.

Any ideas what is going on?
Is there any command to check the status of my snapshots for damage?
Is there a way to restore the snapshot without affecting the cluster's health?

Thanks for any help, Bill

DavidTurner · August 3, 2021, 6:42pm

So that indicates there's no recoveries going on, presumably because they've failed quickly. I would expect there to be helpful details in the logs, but the cluster allocation explain API is always the best way to diagnose non-green health.

bill_d · August 3, 2021, 7:43pm

Thanks very much David. I totally forgot to check the logs!

There were 4 lines of output each time I tried to restore. The first three end with:
node [o3XmuKUVRwi_nL9WIy0LSA] would have less than the required threshold of 0b free (currently 46.4gb free, estimated shard size is 131.4gb), preventing allocation

So it's pretty clear what's wrong.
Thank you!

system · August 31, 2021, 7:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index copy via snapshot & restore not working as expected Elasticsearch	1	438	July 6, 2017
How to monitor restore from snapshot - failure case? Elasticsearch snapshot-and-restore	6	400	April 29, 2021
Cannot restore snapshot, process already running Elasticsearch	11	7504	July 5, 2017
Snapshot restore process is not finished Elasticsearch	4	2761	July 6, 2017
Check if restore operation is still running Elasticsearch	14	2214	August 6, 2020

Restore from snapshot fails with no recovery information

Related topics