Deleting snapshots on S3 sometimes fails with NoSuchFileException

(Dominik Stadler) #1


we run a number of Elasticsearch clusters on Amazon EC2 clusters and perform regular hourly snapshot to S3 using the AWS plugins. Afterwards, the snapshots are "thinned out" by removing some of the snapshots earlier, keeping some for a longer time so we have a daily snapshot after some time and later on only a monthly snapshot.

This usually runs fine, but we see the delete of a snapshot to fail sometimes with a NoSuchFileException, between 1 to 5 times per week.

Can this be some concurrent access? (we try to avoid it and as far as I know Elasticsearch itself should not allow concurrent snapshot/delete actions anyway).

Elasticsearch is on 5.3.3, (was running 2.3.5 before), we are planning to upgrade to 5.6.7 and then on to 6.2.x in the near future.

Any known issue which can cause this? It seems the following deletes/snapshots do work fine.

[2018-03-07T08:06:29,990][WARN ][r.suppressed             ] path: /_snapshot/xyz/snapshot_2018-03-06-02-30-utc, params: {repository=xyz, snapshot=snapshot_2018-03-06-02-30-utc}
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to write file list
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize( ~[elasticsearch-5.3.3.jar:5.3.3]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.delete( ~[elasticsearch-5.3.3.jar:5.3.3]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.delete( ~[elasticsearch-5.3.3.jar:5.3.3]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot( ~[elasticsearch-5.3.3.jar:5.3.3]
	at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$6( ~[elasticsearch-5.3.3.jar:5.3.3]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ ~[elasticsearch-5.3.3.jar:5.3.3]
	at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:1.8.0_162]
	at java.util.concurrent.ThreadPoolExecutor$ [?:1.8.0_162]
	at [?:1.8.0_162]
Caused by: java.nio.file.NoSuchFileException: Blob [pending-index-2707] does not exist
	at ~[?:?]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic( ~[elasticsearch-5.3.3.jar:5.3.3]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize( ~[elasticsearch-5.3.3.jar:5.3.3]
	... 8 more

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.