Snapshot of indices Fails


(vinothini) #1

We have a production elasticsearch cluster with 3 Nodes.
Settings of an Index:
{
"logstash-2016.03.17" : {
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "2"
}
Snapshot of indices fails in production Deployment with the below error(Pasted the Error Below)
It does not fail in the same nodes all the time. The node in which error is displayed is random(Sometimes error in node1 only, node2 and node3 no errors and viceversa).In the error, it is mentioned that “NoSuchFileException: /mnt/nfs/home/indices/logstash-2016.03.16/2”, But I see that the path is available in all the elasticsearch nodes with full permissions, and snapshot has been successful for shard ids 0 and 3.Initially we were doing snapshot using curator. While googling, in various sites, few people faced issues with curator it was suggested to make snapshot using curl with snapshot API. I tried the same and facing the same issue.Kindly guide us on how to resolve this issue. And we are not facing this issue in our test setup. Snapshot creation is successful in test clusters.
[root@esearch1c ~]# ls -lrth /mnt/nfs/home/indices/logstash-2016.03.16/
total 12K
-rw-r--r-- 1 nobody nobody 1.3K Mar 17 06:11 snapshot-snapshot-2016.03.16
**drwxr-xr-x 2 nobody nobody 4.0K Mar 17 06:11 3
**drwxr-xr-x 2 nobody nobody 4.0K Mar 17 06:11 0

Snapshot Status while taking the snapshot:

[root@esearch1a elk-curator]# curl -XGET 'http://xxx.xxx.xxx.xxx:9200/_snapshot/snapshots/snapshot-2016.03.16?pretty'
{
  "snapshots" : [ {
    "snapshot" : "snapshot-2016.03.16",
    "version_id" : 1070399,
    "version" : "1.7.3",
    "indices" : [ "logstash-2016.03.16" ],
    "state" : "PARTIAL",
    "start_time" : "2016-03-17T06:09:25.668Z",
    "start_time_in_millis" : 1458194965668,
    "end_time" : "2016-03-17T06:09:42.268Z",
    "end_time_in_millis" : 1458194982268,
    "duration_in_millis" : 16600,
    "failures" : [ {
      "node_id" : "VqvPMaClQ3qj0jgCR2swdQ",
      "index" : "logstash-2016.03.16",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2016.03.16][4] failed to list blobs]; nested: NoSuchFileException[/mnt/nfs/home/indices/logstash-2016.03.16/4]; ",
      "shard_id" : 4,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "VqvPMaClQ3qj0jgCR2swdQ",
     "index" : "logstash-2016.03.16",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2016.03.16][1] failed to list blobs]; nested: NoSuchFileException[/mnt/nfs/home/indices/logstash-2016.03.16/1]; ",
      "shard_id" : 1,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "tQwS7aLAR3S02q0hRmTISQ",
      "index" : "logstash-2016.03.16",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2016.03.16][2] failed to list blobs]; nested: NoSuchFileException[/mnt/nfs/home/indices/logstash-2016.03.16/2]; ",
      "shard_id" : 2,
      "status" : "INTERNAL_SERVER_ERROR"
    } ],
    "shards" : {
      "total" : 5,
      "failed" : 3,
      "successful" : 2
    }
  } ]

Error:
[snapshots:snapshot-2016.03.16] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: [logstash-2016.03.16][2] failed to list blobs
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:450)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:148)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:85)
at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:871)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /mnt/nfs/home/indices/logstash-2016.03.16/2
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.newDirectoryStream(Files.java:514)
at org.elasticsearch.common.blobstore.fs.FsBlobContainer.listBlobsByPrefix(FsBlobContainer.java:65)
at org.elasticsearch.common.blobstore.fs.FsBlobContainer.listBlobs(FsBlobContainer.java:56)


(system) #2