I am running into an issue when taking a snapshot of our cluster. For some odd reason, the snapshot isn't fully completing and I am seeing a partial status after the snapshot completes. The node ID listed for every index is happening on Node1 of the cluster however I am seeing the index IDs be created on the NFS mount/file share location for every node (including Node1) so I am not sure why this happening. I verified access on the mount and elasticsearch has the necessary access to write to all nodes in the cluster. The logs only show that node1 cannot access the mount. I have attached my findings below-hoping for any kind of help
cluster setup is 2 data nodes, 2 master nodes and 1coordinating node. (node 1 is a data node)
Dev tools output after running snapshot reason" : "IndexShardSnapshotFailedException[Failed to snapshot]; nested: ElasticsearchException[failed to create blob container]; nested: AccessDeniedException[/data/disk3/elastic_snapshot/indices/j7EykkgiTICux52ziejbFA/0]
Cluster log from node 1
[elastic_snapshot] failed to verify repository
org.elasticsearch.repositories.RepositoryVerificationException: [elastic_snapshot] store location [/data/disk3/elastic_snapshot] is not accessible on the node
It seems likely that this isn't the case. A common problem with NFS-based shared filesystem repositories is that access is determined by user ID by default. You might be running Elasticsearch as the same named user on each node, but if these users have different user IDs then they will have inconsistent permissions in the repository.
If you need more help after checking this, please could you share the whole stack trace of the error in the cluster log?
Thanks for your response @DavidTurner, can you provide how I can check that? This was a new process for me so I am learning as I go. I’ll also work to provide the full stack logs
Here is the full error from the stack log @DavidTurner . I am still looking into it on my end as well. The snapshot is showing as a partial state and I am seeing the indicies in the location specified so I am really confused why this is happening. The permissions on the indices folder was modified as a test with chmod 777. I also have copied the failed shard count below the logs.
[2019-05-09T08:25:36,365][WARN ][o.e.s.SnapshotShardsService] [server.com] [[.security_audit_log-2019.04.21][0]][elastic_snapshot:upgrade_snapshot/lQv50ZWnRrKEuZ1aljgY0A] failed to snapshot shard
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to snapshot
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:420) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.snapshots.SnapshotShardsService.access$300(SnapshotShardsService.java:97) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:354) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.5.4.jar:6.5.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: org.elasticsearch.ElasticsearchException: failed to create blob container
at org.elasticsearch.common.blobstore.fs.FsBlobStore.blobContainer(FsBlobStore.java:72) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.<init>(BlobStoreRepository.java:947) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.<init>(BlobStoreRepository.java:940) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.<init>(BlobStoreRepository.java:1168) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:851) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:410) ~[elasticsearch-6.5.4.jar:6.5.4]
... 7 more
Caused by: java.nio.file.AccessDeniedException: /data/disk3/elastic_snapshot/indices/Bm9HgASzRdmGlPR_wbiEBQ/0
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384) ~[?:?]
at java.nio.file.Files.createDirectory(Files.java:674) ~[?:1.8.0_191]
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781) ~[?:1.8.0_191]
at java.nio.file.Files.createDirectories(Files.java:767) ~[?:1.8.0_191]
at org.elasticsearch.common.blobstore.fs.FsBlobStore.buildAndCreate(FsBlobStore.java:89) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.blobstore.fs.FsBlobStore.blobContainer(FsBlobStore.java:70) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.<init>(BlobStoreRepository.java:947) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.<init>(BlobStoreRepository.java:940) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.<init>(BlobStoreRepository.java:1168) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:851) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:410) ~[elasticsearch-6.5.4.jar:6.5.4]
... 7 more
One thing to note.....I am seeing my permissions get overwritten by the Kibana user. Not sure if that is a big deal or part of the problem but something to note as well.
-rw-r--r--. 1 elasticsearch elasticsearch 29 May 9 08:47 index-5
-rw-r--r--. 1 kibana kibana 11K May 9 08:59 index-6
-rw-r--r--. 1 kibana kibana 8 May 9 08:59 index.latest
drwxrwxrwx. 111 elasticsearch elasticsearch 4.0K May 9 08:59 indices
-rw-r--r--. 1 kibana kibana 98K May 9 08:59 meta-ceo4_PhxQF-y-N2uQyFW5w.dat
-rw-r--r--. 1 kibana kibana 72K May 9 08:59 snap-ceo4_PhxQF-y-N2uQyFW5w.da
Yes, this all points towards inconsistent user ids. I would check the output of id elasticsearch and id kibana on each node, and I think you will find one node with username kibana that shares a user id with another node's elasticsearch.
Good call! It looks like we are seeing matched IDs for Elasticsearch and Kibana from two different servers. I just need to figure out how to change them (all new stuff to me) I will keep you updated with what I do to fix this but i appreciate you pointing me in the right direction. Thanks again!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.