We're running ES version 1.5.2 across 3 data nodes and 3 clients with around 900GB of data.
In the process of migrating to a new (larger) cluster we attempted to create a snapshot, however during this process it became apparent that we were going to run out of disk space on the shared mount, and as such I issued an abort command.
Due to the pending processes already queue up, the abort/delete command didn't get processed before the mount hit 100% usage, as such ES threw the following exception:
[2016-08-31 15:22:24,546][WARN ][snapshots ] [node name] [[blehblehbleh]] [manual_snapshot:snapshot_31-08-2016] failed to create snapshot org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: [phoenix_reporting_20150126_8c3d2e02a] Failed to perform snapshot (index files) at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:502) at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:140) at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:85) at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:817) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.close0(Native Method) at java.io.FileOutputStream.close(FileOutputStream.java:393) at java.io.FilterOutputStream.close(FilterOutputStream.java:160) at org.elasticsearch.common.blobstore.fs.FsBlobContainer$1.close(FsBlobContainer.java:100) at java.io.FilterOutputStream.close(FilterOutputStream.java:160) at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshotFile(BlobStoreIndexShardRepository.java:559) at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:500) ... 6 more
I've since added more space to the mount though the cluster still has the backup in the "STARTED" state, and a whole bunch of pending tasks have built up such as the following:
151871375 21.8h NORMAL update snapshot state 151871632 21.8h NORMAL update snapshot state 151869124 21.9h NORMAL update snapshot state 151869189 21.9h NORMAL update snapshot state 151869320 21.9h NORMAL update snapshot state 151869446 21.9h NORMAL update snapshot state 151869575 21.9h NORMAL update snapshot state 151869703 21.9h NORMAL update snapshot state 151869833 21.9h NORMAL update snapshot state 151869959 21.9h NORMAL update snapshot state
I can't delete the repository, or abort the snapshot at this point, moving snapshot that'd been created out of the mount hasn't helped things either.
Is there anything I can do here besides restarting the cluster at this point?
Thanks in advance.