A POC is under progress for the Elasticsearch snapshot and restoration and this POC is on the Azure VM's Linux environment. The version of Elasticsearch is 8.6.2 with three node cluster setup.
The external storage setup is from the another Azure VM with 200 GB of disk space with in the same VLAN and exported that 200 GB mount point as a NFS storage as we don't have storage provision from the cloud.
This repo is added to the elasticsearch.yml configuration file across all the nodes in the cluster. I have scheduled a SLM policy to take the snapshot on daily basis but it got failed due to "Failed to create blob container and access denied exception".
At this point in time, full permission has been granted to the source and the destination mount points with user and group ownership as elasticsearch. Both SLM and manual snapshots are failed due to the access denied exceptions.
Below are the log exceptions from the master node and can someone please help to resolve this issue?
Suppressed: org.elasticsearch.ElasticsearchException: failed to create blob container
at org.elasticsearch.common.blobstore.fs.FsBlobStore.blobContainer(FsBlobStore.java:56) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.indexContainer(BlobStoreRepository.java:1630) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$finalizeSnapshot$46(BlobStoreRepository.java:1460) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.action.ActionRunnable$1.doRun(ActionRunnable.java:34) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.6.2.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Caused by: java.nio.file.AccessDeniedException: /etc/middleware/indices/8GVR_UIGQ6WDcBklYUpcrw
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397) ~[?:?]
at java.nio.file.Files.createDirectory(Files.java:700) ~[?:?]
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:807) ~[?:?]
at java.nio.file.Files.createDirectories(Files.java:793) ~[?:?]
at org.elasticsearch.common.blobstore.fs.FsBlobStore.buildAndCreate(FsBlobStore.java:68) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.common.blobstore.fs.FsBlobStore.blobContainer(FsBlobStore.java:54) ~[elasticsearch-8.6.2.jar:?]
... 8 more
Caused by: java.nio.file.AccessDeniedException: /etc/middleware/indices/VSBpGoTdRa-uXxYyhyGj0g
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397) ~[?:?]
at java.nio.file.Files.createDirectory(Files.java:700) ~[?:?]
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:807) ~[?:?]
at java.nio.file.Files.createDirectories(Files.java:793) ~[?:?]
at org.elasticsearch.common.blobstore.fs.FsBlobStore.buildAndCreate(FsBlobStore.java:68) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.common.blobstore.fs.FsBlobStore.blobContainer(FsBlobStore.java:54) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.indexContainer(BlobStoreRepository.java:1630) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$finalizeSnapshot$46(BlobStoreRepository.java:1460) ~[elasticsearch-8.6.2.jar:?]
at org.elasticsearch.action.ActionRunnable$1.doRun(ActionRunnable.java:34) ~[elasticsearch-8.6.2.jar:?]
... 5 more
[2024-01-05T14:18:51,407][ERROR][o.e.x.s.SnapshotLifecycleTask] [gva6439] failed to create snapshot for snapshot lifecycle policy [backup_policy1_daily]: org.elasticsearch.snapshots.SnapshotException: [esbackup:backup-2024.01.05-nijkrpp9rd64cgwdk7q-1g/RRz3lwLmTUCSVyAQRt7M_Q] failed to update snapshot in repository
Verify the permissions on the /etc/middleware directory. Elasticsearch needs to have read, write, and execute permissions on this directory. You can check the permissions by running the command ls -l /etc/middleware. The output should show that the elasticsearch user has rwx permissions.
I think you get a different error (mentioning path.repo) if that were the problem.
Elasticsearch isn't doing anything magic here, it's just trying to create a directory as normal and this error is coming back from the OS. You will get the same problem if you try and run mkdir /etc/middleware/indices/VSBpGoTdRa-uXxYyhyGj0g from the command line as the user as which ES is running.
root @ gva6439 [QAS] /etc/middleware $ su - elasticsearch
su: warning: cannot change directory to /nonexistent: No such file or directory
This account is currently not available.
root @ gva6439 [QAS] /etc/middleware $
Our live environments are running on 8.6 and hence we wanted to perform the snapshot and restoration POC on 8.6 and hence no testing's on the latest version 8.11.
Please let me know how to proceed further to resolve this issue.
Oh, you're using SELinux. It would have been worth mentioning that up front! That makes it all a lot more complicated, but the problem lies in your SELinux config and not in Elasticsearch so you will need to work with your local sysadmin folks to fix it. As I said, Elasticsearch isn't doing anything magic, it's just trying to create a directory.
I have no update on this, as David mentioned this does not seem an issue with Elasticsearch, but to the underlying permissions on your system and SELinux.
I do not have much experience with SELinux, so I can not help further.
If SELinux is enabled on your system, you can use the chcon command to change the security context of the /etc/middleware directory to allow Elasticsearch to write to it. Here's an example:
I have compared the ls -lZ /etc/middleware output with the working environment (lab servers - two node elasticsearch cluster) along with Azure DR servers where we have the snapshot issue and the output of the commands furnished below for your reference.
It seems that the permission are not different for both working and non working environment. In this case, please let me know what needs to be changed on the non working environment.
Snapshot and restoration is working fine as expected on the lab environment and output of ls -lZ /etc/middleware
[SBX] ~ $ ls -lZ /etc/middleware/
total 112
drwxrwxrwx. 3 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 4096 Oct 27 12:30 backup
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 9083 Jan 16 02:30 index-152
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 8 Jan 16 02:30 index.latest
drwxr-xr-x. 19 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 4096 Dec 17 13:32 indices
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 19312 Jan 13 13:32 meta-6WzpVUYuQKWQdz-PSklMaQ.dat
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 19313 Jan 14 13:32 meta-o69tAELBS-WuTk0CkuGOQQ.dat
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 19311 Jan 15 13:32 meta-vRvG7LJSTtSJGJ-pd-vwvg.dat
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 604 Jan 13 13:32 snap-6WzpVUYuQKWQdz-PSklMaQ.dat
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 601 Jan 14 13:32 snap-o69tAELBS-WuTk0CkuGOQQ.dat
-rw-r--r--. 1 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 599 Jan 15 13:32 snap-vRvG7LJSTtSJGJ-pd-vwvg.dat
drwxr-xr-x. 2 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 4096 Oct 27 13:47 test
drwxrwxrwx. 3 768 768 system_u:object_r:nfs_t:s0 4096 Oct 27 12:51 testrepo
drwxr-xr-x. 2 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 4096 Nov 2 09:15 tests-gptAluNTTgSmOuD_o1kq1Q
drwxr-xr-x. 2 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 4096 Dec 12 13:44 tests-LLUbX7UeTWGgFAJoPsFsFw
drwxr-xr-x. 2 elasticsearch elasticsearch system_u:object_r:nfs_t:s0 4096 Oct 27 12:59 tests-wIF1pmNeT0maTfbnvrQ-aA
-rwxrwxrwx. 1 root root system_u:object_r:nfs_t:s0 0 Nov 1 12:05 test.txt
Output from Azure DR servers where the snapshot is not working as expected:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.