Cannot restore snapshot, process already running

snoir · July 29, 2016, 3:39pm

Hello,

We're having some troubles here with our elasticseach cluster. The cluster is made with 10 nodes, under Debian Jessie with elasticsearch 2.3.4

I'm trying to restore an index with the following command, on one of the 10 nodes.

curl -XPOST "http://localhost:9200/_snapshot/es_backup/es-backup-sno/_restore" -d '{"indices": "index-20160718"}'

The command return this error :
{"error":{"root_cause":[{"type":"concurrent_snapshot_execution_exception","reason":"[es_backup:es-backup-sno] Restore process is already running in this cluster"}],"type":"concurrent_snapshot_execution_exception","reason":"[es_backup:es-backup-sno] Restore process is already running in this cluster"},"status":503}

It looks like a restore is already running. We thinks there an old restore running, with an non existent snapshot (remove in the past), on non existent indexes (remove in the past two).

The command curl -s 'http://localhost:9200/_cluster/state' | jq '.restore' return a restore, using the non existent snapshot on the non existent indexes (yes it's kind a mess...).

es-backup-20160708 is the old snapshot, the old indexes are the index-201605*, shard are in FAILURE state.

{ "snapshots": [ { "snapshot": "es-backup-20160708", "repository": "es_backup", "state": "STARTED", "indices": [ ... "shards": [ { "index": "index-20160527", "shard": 2, "state": "FAILURE" },

We don't know how to kill this running restore, maybe there a tip to do that ?

Thanks,

ywelsch · July 29, 2016, 3:45pm

Have you tried deleting the es-backup-20160708 snapshot?

snoir · July 29, 2016, 3:47pm

The es-backup-20160708 snapshot is already removed, when I launch curl -XDELETE 'http://localhost:9200/_snapshot/es_backup/es-backup-20160708' :

{"error":{"root_cause":[{"type":"snapshot_missing_exception","reason":"[es_backup:es-backup-20160708] is missing"}],"type":"snapshot_missing_exception","reason":"[es_backup:es-backup-20160708] is missing","caused_by":{"type":"no_such_file_exception","reason":"/var/backup/elasticsearch/es_backup/snap-es-backup-20160708.dat"}},"status":404}

ywelsch · July 29, 2016, 3:57pm

Is the index index-20160527 still part of the cluster state (i.e. _cat/indices)?

snoir · August 1, 2016, 7:51am

The index is not listed in the cluster state.

ywelsch · August 1, 2016, 8:07am

This looks like a bug. Can you provide some additional information that can help us figure out why this happened?

What kind of repository type did you use?
Were there any other failures while restoring the snapshot? For example node crashes?

The only way to unblock the cluster for future restores is to do a full-cluster restart.

ywelsch · August 1, 2016, 1:13pm

Just some more thoughts: What happens if you explicitly delete the index:

curl -XDELETE 'http://localhost:9200/index-20160527'

Also, could you send me the full cluster state (private message if it contains confidential information)?

snoir · August 1, 2016, 3:49pm

The backup repository is an NFS share.
There was not other failures during the restore.

We already try removing all the "ghosts" indexes, with no success.

I send you the all state if i can.

snoir · August 2, 2016, 12:32pm

We find the solution by stoping all the nodes of the cluster at the same time (we did that before, but not a the exact same time).
After the full stop, we start all the node and the stuck restore process was gone !

Thanks for your help !

ywelsch · August 3, 2016, 10:26am

I've opened an issue on our Github repo: https://github.com/elastic/elasticsearch/issues/19774

ywelsch · August 3, 2016, 12:57pm

@snoir just to validate our assumptions on the ticket, could you provide me with logs from around the time where the restore was started? In particular we are looking for events such as deleted indices, restarted nodes and changed masters.

Topic		Replies	Views
ElasticSearch Restoration - How to overwrite existing datas Elasticsearch	2	9571	January 23, 2019
Cannot restore a snapshot with IndexMissingException[[_snapshot] missing] Elasticsearch	2	371	July 6, 2017
Cannot restore a snapshot with IndexMissingException[[_snapshot] missing] error Elasticsearch	1	415	July 6, 2017
Unable to Restore indices through Snapshot Elasticsearch	4	612	March 1, 2018
Elasticsearch snapshot does not exist Elasticsearch	2	3146	January 14, 2018

Cannot restore snapshot, process already running

Related topics