Snapshot restore Failed recovery of index, getting FileAlreadyExistsException

When trying to restore a 2.2 snapshot on 5.1.1 cluster the below exception returned. out of 10 shards, 2 shards failed recovery. Able to restore the same snapshot in 2.2 and 5.5 cluster.
Any idea why? there is no memory problem or permission issues. Similar Snapshot restored fine on 5.1.1 cluster some days back.

IndexShardRestoreFailedException[Failed to recover index]; nested: FileAlreadyExistsException[C:\elasticsearch\elasticsearch-5.1.1\data\nodes\0\indices\Zttv7nGBRn-KHM0uakuQ3A\2\index_km_Lucene50_0.doc];

Appreciate your help

How did you run the restore? Did you close the index to be restored and ran the restore again? Is it possible that there are still remnants from the earlier restore on your disk?

yes, we used to close the existing index to be restored to update mapping and delta.

It looks like you're encountering a bug then (could be related to https://github.com/elastic/elasticsearch/pull/20220)

Are you seeing the same issue if you repeatedly restore the same index in the 5.5 cluster (and close it in-between restores)?

I will see if I can reproduce this.

I tried in 5.5 cluster closing and restoring again and again, this issue is not happening on it. This happens only on 5.1.1 cluster.

Tried to restore on local machine (5.1.1) as new indices and got 5 unassigned primary shards out of 10. What may be the reason?

For unassigned shards seeing the same exception
failed recovery, failure RecoveryFailedException[[index][0]: Recovery failed on {o79mICl}{o79mIClXR9iufzWCvjwfaw}{32ae7kuXRYuJSqA-qfM_lA}{localhost}{127.0.0.1:9600}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: IndexShardRestoreFailedException[failed to restore snapshot [prod_snapshot/prod_snapshot]]; nested: IndexShardRestoreFailedException[Failed to recover index]; nested: FileAlreadyExistsException[C:\elasticsearch\elasticsearch-5.1.1\data\nodes\0\indices\isKmuLWLQrWdZvyrVg-rVA\0\index_25m.cfs];

but there is no such file.

that's truly weird. Is it possible to share the repository contents with us?

@Igor_Motov are you aware of such a bug in 5.1.1 that has been fixed in subsequent versions?

@ywelsch yes, I have seen this before in one of support cases. It seems to be happening on some Windows machines. We were not able to reproduce the problem locally and as a result we don't really know the reason for this issue. However, the bit about snapshot being originally created on 2.2 is very useful bit of information.

I cannot think of any snapshot-related fixes between 5.1 and 5.5, but I suspect that this bug can be in the store or lucene. So, it's possible that it was fixed on this level.

yes exactly, this issue is occurring when the snapshot is restored (created from ubuntu) on a windows machine. However, the similar snapshot was working fine till few days back in the same procedure on windows. Restored fine on 2.2 or 5.5 cluster on ubuntu. did not try on windows.

@ywelsch, btw, I tried to close and restore snapshot repeatedly on 5.5 cluster on ubuntu. did not try on windows machine though. And the snapshot has production data, so unable to share the repository contents.

@rinu do you still have this snapshot arounds somewhere? Would you be able/willing to try restoring it locally on the system where it fails with some debug logging enabled?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.