I have 2 similarly configured 3 node elasticsearch clusters. Both run 7.10.2, but one is bigger than the other.
I'm trying to restore an index from a snapshot. In the small cluster, the restore takes about 3 minutes, during which the cluster is yellow. In the big cluster, starting the restore causes the cluster to go red. I waited some time, but had to delete the restored index while it was red so the cluster would be up.
Both clusters have a repository with scheduled backups.
GET /_cat/snapshots/gcs_repository?v
Shows 4 snapshots, all have status = "SUCCESS".
To give an idea of the size, the small cluster has duration between 1-2 minutes. The large cluster has 4-12 minutes.
POST /_snapshot/gcs_repository/_verify
Shows 3 nodes.
GET /_snapshot/gcs_repository/
Shows the repository, the only difference here is that the large cluster has "max_snapshot_bytes_per_sec" : "320mb"
.
When I start the restore, I issue a command like:
POST /_snapshot/gcs_repository/daily-backup-0/_restore
{
"indices": "foo-000001",
"rename_pattern": "foo-000001",
"rename_replacement": "foo-test0"
}
Both clusters respond with "accepted".
The small cluster's health goes to yellow, because there is a new index foo-test0 that is yellow. The large cluster goes red, as is the new index.
Then I check the status with
GET /foo-test0/_recovery
And the small cluster responds with a bunch of details about the recovery process, including the percent done. Awesome.
But the large cluster responds with
{ }
Since the large cluster is being used, I can't leave it red for long, so i delete the new index and it goes back to green.
- Any ideas what is going on?
- Is there any command to check the status of my snapshots for damage?
- Is there a way to restore the snapshot without affecting the cluster's health?
Thanks for any help, Bill