Can't take Elastic 6.8.0 snapshot

Hi, Try to make a snapshot of the cluster

PUT /_snapshot/my_backup/pre_prod_2

It works for 3-4 hours and after struggling on this state:

"snapshot" : "pre_prod_2",
"repository" : "my_backup",
"uuid" : "YhlA8Ke9Rlu34HpVLDvsLg",
"state" : "STARTED",
"include_global_state" : true,
"shards_stats" : {
"initializing" : 39,
"started" : 4,
"finalizing" : 0,
"done" : 308,
"failed" : 89,
"total" : 440
},

And nothing happening.

What to do? Is it possible rerun?

Hi @tfe2012

two questions:

  1. If you abort this kind of stuck snapshot (by deleting it), does it eventually stop properly?
  2. What are those 89 failed shards? Why did they fail? (can you share logs or the concrete failures?)

What to do? Is it possible rerun?

Aborting and running the snapshot again seems like the best option here if the snapshot isn't making any progress at all. If it's making some progress, letting it finish and the running another snapshot will be faster due to the incremental nature of snapshots. Even if you have some failures during one snapshot, the data it put in the repository will be reused by the next snapshot you run where possible so even a partially failed snapshot contributes progress to future snapshots.

Hi,

  1. I delete snapshot, and start snapshot again, it works.

  2. I stop all update/delete process. Connect another remote disk driver. Turn off compression on snapshot.

And after that it start works well. I don't know what exactly help :slight_smile:

@tfe2012

I stop all update/delete process.

Likely this would've sufficed to fix the issue as the snapshot implementation is designed to be resilient to errors. As a tip for next time you run into any trouble, I'd try this first before moving to other more time-consuming work-arounds :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.