Snapshot operation stuck IN_PROGRESS, delete command doesn't work


#1

A snapshot process is stuck in "IN_PROGRESS" although the status of the snapshot is already "ABORTED" ... canceling the snapshot or generating a new one is not possible any more.
Tried already the cleanup script from https://github.com/imotov/elasticsearch-snapshot-cleanup, no change.

The error was described for versions before 1.5.x but seems still exists ... so

  • is there a possibility to resolve this issue without restart?
  • will be a rolling restart (first master, then data nodes) resolve that issue?

Using Elasticsearch 1.5.2


ABORTED snapshot blocks doing snapshot
Snapshot in progress for long time. How do I delete it?
#2

Anyone with same issue and experience that rolling restart solves the issue (aka clean the incorrect snapshot state)?


(Casie Owen) #3

Hi --

I've seen this a couple of times, and have resolved it by running GET /_snapshot/backup/_status to get the details of the snapshot that is stuck in progress state and searching the output for "stage": "INIT". In each case, I found a shard in initializing (INIT) stage for one of the indices in the stuck snapshot. You'll see something like:

"your_index"
"3": {
"stage": "INIT",
"stats": {
"number_of_files": 0,
"processed_files": 0,
"total_size_in_bytes": 0,
"processed_size_in_bytes": 0,
"start_time_in_millis": 1478143477100,
"time_in_millis": 0
},
"node": "cwUM4qYuRxOpeOT51tyzkA"

You can then run Get _nodes/stats to get the friendly name associated to that node value, and then restart only that node. That has worked twice for me.


(system) #4