ElasticSearch fails to backup certain indices


(Ovidiu Calbajos) #1

Hi there,

Before posting this message, I've searched the forum but didn't find anything relevant for the issue I've encountered. But if anyone thinks that the forum has a similar post, please point me to that post, otherwise please continue to read.

Unfortunately I couldn't find any solution to solve this issue, and frankly this is my last option.
This behavior has been noticed on different indices stored in the clusters and have no connection to the date or the size of the indices. I've seen failing indices from 2 days ago with 350KB in size and indices older than 20 days with 14GB in size.

If you need any other details beside those provided bellow, please let me know.

Issue: ElasticSearch fails to snapshot certain indices.

Cause: Unknown

Details:

ES Clusters:

A) 1 Hot node, 1 Warm node
B) 3 Hot nodes, 5 Warm nodes
Both clusters are running 1.6.2

Snapshot storage:

NFS mount accessible from all the nodes. Path /backups/*name-of-the-cluster ;
Available disk space on the backup server 5.2TB
Available inodes on the backup server 10485688991

ElasticSearch config:

path:
data: /usr/share/elasticsearch/data/staging201
repo: /backups/staging201

Command executed:

curl -XPUT http://localhost:9200/_snapshot/staging201/solrsearch-2016.09.26?wait_for_completion=true -d '{"indices":"solrsearch-2016.09.26", "ignore_unavailable": "true", "include_global_state": false}'

Output on the console:

{"snapshot":{"snapshot":"solrsearch-2016.09.26","indices":["solrsearch-2016.09.26"],"state":"PARTIAL","start_time":"2016-10-18T11:03:50.867Z","start_time_in_millis":1476788630867,"end_time":"2016-10-18T11:04:00.477Z","end_time_in_millis":1476788640477,"duration_in_millis":9610,"failures":[{"node_id":"ytzwGkYJQqWqvpJ_fkv3kQ","index":"solrsearch-2016.09.26","reason":"IndexShardSnapshotFailedException[[solrsearch-2016.09.26][2] failed to list blobs]; nested: NoSuchFileException[/backups/staging201/indices/solrsearch-2016.09.26/2]; ","shard_id":2,"status":"INTERNAL_SERVER_ERROR"},{"node_id":"ytzwGkYJQqWqvpJ_fkv3kQ","index":"solrsearch-2016.09.26","reason":"IndexShardSnapshotFailedException[[solrsearch-2016.09.26][1] failed to list blobs]; nested: NoSuchFileException[/backups/staging201/indices/solrsearch-2016.09.26/1]; ","shard_id":1,"status":"INTERNAL_SERVER_ERROR"},{"node_id":"ytzwGkYJQqWqvpJ_fkv3kQ","index":"solrsearch-2016.09.26","reason":"IndexShardSnapshotFailedException[[solrsearch-2016.09.26][3] failed to list blobs]; nested: NoSuchFileException[/backups/staging201/indices/solrsearch-2016.09.26/3]; ","shard_id":3,"status":"INTERNAL_SERVER_ERROR"}],"shards":{"total":5,"failed":3,"successful":2}}}

_

Messages in log file:

SEE NEXT MESSAGE


(Ovidiu Calbajos) #2

Messages in log file:

[2016-10-18 13:03:50,925][WARN ][snapshots ] [cr1-elk-211-staging201] [[solrsearch-2016.09.26][2]] [staging201:solrsearch-2016.09.26] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: [solrsearch-2016.09.26][2] failed to list blobs
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:442)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:140)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:85)
at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:871)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /backups/staging201/indices/solrsearch-2016.09.26/2
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.newDirectoryStream(Files.java:514)
at org.elasticsearch.common.blobstore.fs.FsBlobContainer.listBlobsByPrefix(FsBlobContainer.java:65)
at org.elasticsearch.common.blobstore.fs.FsBlobContainer.listBlobs(FsBlobContainer.java:56)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:440)
... 6 more
[2016-10-18 13:03:50,956][WARN ][snapshots ] [cr1-elk-211-staging201] [[solrsearch-2016.09.26][1]] [staging201:solrsearch-2016.09.26] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: [solrsearch-2016.09.26][1] failed to list blobs
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:442)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:140)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:85)
at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:871)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /backups/staging201/indices/solrsearch-2016.09.26/1
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.newDirectoryStream(Files.java:514)
at org.elasticsearch.common.blobstore.fs.FsBlobContainer.listBlobsByPrefix(FsBlobContainer.java:65)
at org.elasticsearch.common.blobstore.fs.FsBlobContainer.listBlobs(FsBlobContainer.java:56)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:440)
... 6 more
[2016-10-18 13:03:50,969][WARN ][snapshots ] [cr1-elk-211-staging201] [[solrsearch-2016.09.26][3]] [staging201:solrsearch-2016.09.26] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: [solrsearch-2016.09.26][3] failed to list blobs
[ at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:442)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:140)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:85)
at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:871)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /backups/staging201/indices/solrsearch-2016.09.26/3
...truncated...


(Ovidiu Calbajos) #3

Anyone?


(Mark Walkom) #4

What are the permissions on the mount?


(Ovidiu Calbajos) #5

The solrsearch-2016.09.26 indice has 755 root:root


(Mark Walkom) #6

What about the rest of it though?


(Ovidiu Calbajos) #7

They all have the same permissions and ownership.
What I found strange is that only for certain indices the snapshot fails with this INTERNAL_SERVER_ERROR message.


(system) #8