Trouble with errors when creating a snapshot

Hello from Japan
I have a question for all the dear Elastic engineers
I want to use the snapshot function to save a snapshot on one node in a cluster.
*I have configured a cluster of three Elasticsearch machines.

I have introduced Samba to the server to store the snapshot repository.
I am trying to create a shared folder with Samba and store the repository there.
However, many errors are bothering me.


Authentication status = Not connected
{
  "name": "ResponseError",
  "message": "repository_verification_exception\n\tCaused by:\n\t\texception: failed to create blob container\n\tRoot causes:\n\t\texception: failed to create blob container"
}

Error message when the shared folder source is set as the master.
*Server names and IP addresses are protected for security reasons.

Authentication status = Not connected
{
  "name": "ResponseError",
  "message": "repository_verification_exception\n\tRoot causes:\n\t\trepository_verification_exception: [repo_test] [[EPknCeSwRISYedwlqybijA, 'org.elasticsearch.transport.RemoteTransportException: [server name][xxx.xxx.xxx.xxx:xxxx][internal:admin/repository/verify]'], [-zN11OFCSPSBnvYOYu5kQg, 'org.elasticsearch.transport.RemoteTransportException: [server name][xxx.xxx.xxx.xxx:xxxx][internal:admin/repository/verify]']]"
}

☆My execution environment is as follows:

OS:Ubuntu 22.04
Samba:4.15.13
Elasticsearch:8.13.4
Kibana:8.13.4

Each elasticsearch.yml file contains the repository settings as follows:

path.repo: /hayato

Similar issues have been reported on the following sites, and I have investigated them but have been unable to resolve them.

I would appreciate any advice on the cause of the error, how to resolve it, and how to obtain a snapshot.

【The results of a survey I conducted on my own.】
☆I suspected a problem with file system permissions, but I'm not sure of the details.
①The original permissions for the shared folder are displayed as drwxrwxrwx root root.
When viewed from other mounted nodes, they are displayed as drwxr-xr-x root root, and cannot be changed with the chmod, chgrp, or chown commands.
This may be why writing is not possible and the snapshot may be failing, but I don't know how to resolve it.
※I have confirmed that I can access the shared folder and write files from my own laptop.

②I checked the following site and found that the UID and GID of each node were different, so I changed them to the same values, but now elasticsearch.service won't start.

Many NFS implementations match accounts across nodes using their numeric user IDs (UIDs) and group IDs (GIDs) rather than their names. 
It is possible for Elasticsearch to run under an account with the same name (often ) on each node,
 but for these accounts to have different numeric user or group IDs. 
 If your shared file system uses NFS then ensure that every node is running with the same numeric UID and GID, 
 or else update your NFS configuration to account for the variance in numeric IDs across nodes.elasticsearch

The Samba configuration file, smb.conf, is written as follows:

[CCR]
   path = /hayato
   browsable = yes
   writable = yes
   guest ok = yes
   guest only = yes
   read only = no
   force create mode = 777
   force directory mode = 777

I am in a very difficult situation.
Please, fellow Elasticsearch engineers, share your wisdom.
Regards
Thank you

※reference infomation
The following is written in the Samba configuration file, smb.conf.
/hayato is mounted as a shared folder on other nodes.

Filesystem Size Used Avail Use% Mounted on
tmpfs 794M 1.1M 793M 1% /run
/dev/vda1 1.9T 37G 1.9T 2% /
tmpfs 3.9G 28K 3.9G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/vda15 105M 6.1M 99M 6% /boot/efi
tmpfs 794M 4.0K 794M 1% /run/user/1010
tmpfs 794M 4.0K 794M 1% /run/user/1001
//xxx.xxx.xxx.xxx/CCR 1000G 57G 944G 6% /hayato

Hello @YUUTA.INOUE-JPN! To fix the problem we must first understand what's causing it. The error you've shared just says failed to create blob container but the full response from ES will include more details than just that simple message. There might also be relevant messages in the ES logs. You'll need to share more details.

1 Like

good day @DavidTurner san
*In Japan, it is customary to add 【~san】 to the end of the name of someone you respect.
Thank you for your reply.
I was in such a difficult situation that I am very happy with your reply.

I can provide you with Elasticsearch cluster logs.
If there are any other logs you need, please let me know.
I'm still new to Elasticsearch, so I don't understand everything that's needed yet.

[2024-06-14T16:51:47,958][WARN ][r.suppressed             ] [hostname] path: /_snapshot/repo_test/_verify, params: {repository=repo_
org.elasticsearch.transport.RemoteTransportException: [hostname][IP Adress:9300][cluster:admin/repository/verify]
Caused by: org.elasticsearch.repositories.RepositoryVerificationException: [repo_test] path  is not accessible on master node
Caused by: org.elasticsearch.ElasticsearchException: failed to create blob container
        at org.elasticsearch.common.blobstore.fs.FsBlobStore.blobContainer(FsBlobStore.java:61) ~[elasticsearch-8.13.4.jar:?]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:2005) ~[elasticsearch-
        at org.elasticsearch.repositories.RepositoriesService$4.doRun(RepositoriesService.java:499) ~[elasticsearch-8.13.4.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[elast
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.4.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Caused by: java.nio.file.AccessDeniedException: /hayato/tests-Q0kMFYEgQYiGThx_n7HgPQ
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:438) ~[?:?]
        at java.nio.file.Files.createDirectory(Files.java:699) ~[?:?]
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:807) ~[?:?]
        at java.nio.file.Files.createDirectories(Files.java:793) ~[?:?]
        at org.elasticsearch.common.blobstore.fs.FsBlobStore.blobContainer(FsBlobStore.java:59) ~[elasticsearch-8.13.4.jar:?]
 failed to verify repository[2024-06-14T16:46:48,474][WARN ][o.e.r.VerifyNodeRepositoryAction] [hostname] [repo_test] failed to veri
org.elasticsearch.repositories.RepositoryVerificationException: [repo_test] store location [/hayato] is not accessible on the node [{xxxxxxxxx}{IP Adress}{IP Adress:9300}{cdfhilmrstw}{8.13.4}{7000099-8503000}{ml.allocated_processors_double=1.0, ml.max_jvm_size=107374182ersion=10.0.0, ml.machine_memory=2059472896, ml.allocated_processors=1}]
Caused by: java.nio.file.AccessDeniedException: /hayato/tests-Euf88JLhSdiIfLc7hWJZZQ/data--zN11OFCSPSBnvYOYu5kQg.dat
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:261) ~[?:?]
        at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:482) ~[?:?]
        at java.nio.file.Files.newOutputStream(Files.java:227) ~[?:?]

The path must be accessible in the same location on all master and data nodes. The error message indicates it is not accessible from or configured on at least one of the master nodes.

1 Like

Well kinda but this is a very generic message, it could be caused by lots of things. The useful one is this one:

That message indicates that the OS is forbidding Elasticsearch from creating the directory /hayato/tests-Q0kMFYEgQYiGThx_n7HgPQ. So there's something wrong with OS-level permissions.

2 Likes

@Christian_Dahlqvist san @DavidTurner san
Thank you both for your replies.

We will look into file system permissions.
Currently we are using samba as the shared directory mechanism, and we think that the mechanism or parameters of samba may be affecting this.
※If you know of any websites or official documentation that introduces using samba to build an Elasticsearch snapshot mechanism, please let me know.
Also, I'd like to research file system permissions and share what I find out. Please let me ask you questions again.

Apart from this, we are facing another challenge: the ID of the Elasticsearch user that will create the repository for creating Elasticsearch snapshots.
We know that to create a snapshot, we need an Elasticsearch account with the same permissions and user ID.

However, when I try to change the user IDs in Ubuntu and align the user IDs of the three accounts, I am also facing the issue of not being able to start the Elasticsearch service.

These issues also exist as difficult issues on GitHub below.
When I checked GitHub, I could not find a solution written there.
Aside from changing the IDs of these Elasticsearch users at the OS level, how can I create a snapshot repository?
I would appreciate any advice.

I don't think we have official docs on this topic, sorry, it's more of a sysadmin question than anything directly related to Elasticsearch.

I don't think this is true, you just need to make sure that the user as which Elasticsearch is running on each node has the necessary permissions in your shared filesystem. I don't have any experience with doing this with Samba, nor do I see many other Samba experts around here. I think you'll get a more precise answer from your local sysadmin folks or a more Samba-related forum.

@DavidTurner san
Thank you for your reply.
I appreciate your feedback.

I'm a beginner with Elasticsearch, but I checked the official documentation and found the following:

The official documentation seems to require the same UID, but how should I understand this?
I'm sorry that I don't understand Elasticsearch well.
Regards
Thank you

The docs you quote relate to NFS, not Samba, so I don't think they apply in your case. But also they don't say anything about requiring equal IDs - indeed they include a recommendation for how to deal with different IDs across nodes:

ensure that every node is running with the same numeric UID and GID, or else update your NFS configuration to account for the variance in numeric IDs across nodes.

1 Like

:slight_smile: This isn't really an Elasticsearch problem, at least not now we've narrowed it down to the AccessDeniedException coming from the operating system. It's now just a case of fixing some filesystem permissions, but unfortunately this isn't a good place to ask for help about that.

1 Like

@DavidTurner san
Thank you for your reply.
I would look into directory permissions first.

If any new issues arise, I'd like to discuss this again.
I'll reply here again when that happens.

yuuta

1 Like

@DavidTurner san and @Christian_Dahlqvist san

Thanks to you, I was able to create a repository on the shared directory. The cause was indeed a filesystem permission issue. Your advice was very helpful. Thank you very much. However, when I tried to restore the cluster from the repository I created, the problem occurred again.

When I tried to restore a snapshot, the following error message was displayed:

※ I perform these operations using the Kibana GUI.

※ These are the error messages displayed on the Kibana GUI.

20240618_TEST:hayato-elcnrb8nqh2gl97_c7dylg/xlSI8zSMTBuKtHPQWeqCoA] cannot restore index [.slo-observability.sli-v2] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name

I understand that this error message means that the snapshot restore failed because there are open indexes.

I tried to close the indexes, but when I try to close the ".apm-agent-configuration" or ".tasks" indexes, for example, the following message is output and I cannot close them.


action [indices:admin/close] is unauthorized for user [elastic] with effective roles [superuser] on restricted indices [apm-agent-configuration], this action is granted by the index privileges [manage_follow_index,manage,all]


action [indices:admin/close] is unauthorized for user [elastic] with effective roles [superuser] on restricted indices [.tasks], this action is granted by the index privileges [manage_follow_index,manage,all]

Also, the above index cannot be deleted.

Could you please tell me how to solve this problem?

First of all, how should we understand these indexes and data streams that cannot be CLOSE or deleted in terms of the purpose of creating and restoring snapshots?

I would be grateful if you could let me know.
Regards.

Hmm I'm not sure I can help with this, but it's rather unrelated to your original message. I would suggest you start a new topic about it.

@DavidTurner san
thank you for your reply.
I noted !
I'm very grateful for the support so far.
Thank you!