If you can't delete a "stuck" (In progress/aborted) snapshot

Maybe this is only an issue in older versions, but I've seen a few posts about this and no solution other than rolling restarts. I've seen this issue a few times, and have been able to resolve it without restarting all the nodes.

You can tun GET /_snapshot/backup/_status to get the details of the snapshot that is stuck in progress, then search the output for "stage": "INIT". In each case, I found a shard in initializing (INIT) stage for one of the indices in the stuck snapshot. You'll see something like:

"your_index"
"3": {
"stage": "INIT",
"stats": {
"number_of_files": 0,
"processed_files": 0,
"total_size_in_bytes": 0,
"processed_size_in_bytes": 0,
"start_time_in_millis": 1478143477100,
"time_in_millis": 0
},
"node": "somenonfriendlyid"

You can then run Get _nodes/stats to get the friendly name associated to that node value, and then restart only that node. That has worked twice for me.

2 Likes