Snapshots stopped working after switching from a single server to cluster (getting the server married?)

rihad · July 27, 2019, 6:39pm

Simple as that. I'm attempting to take snapshots from the same server as before, which isn't the master anymore. path.repo: ["/var/db/elasticsearch"]
Snapshot directory is /var/db/elasticsearch/backup.
For some reason its index-NNN files stopped updating on the day of the switch, judging by FS file last modification time. But other data files seem to be updated. I should note that on other two servers of the cluster backup directory also exists, but has other index-NNN files which started from 1 and going up every day by 1. Why is the backup directory part of the cluster at all? Would moving it outside of repo.path fix the problem?

rihad · July 27, 2019, 6:44pm

I should note that even though I still keep addressing the previous master (now slave) in my API requests (using curl), the log entries regarding backup snapshot creation etc. magically appear in the master log files
[2019-07-27T02:31:00,454][INFO ][o.e.s.SnapshotsService ] [myserver.com] snapshot [backup:snapshot/W3FHurPGSa6MkxFDcRHDKQ] deleted
[2019-07-27T02:31:00,466][INFO ][o.e.r.RepositoriesService] [myserver.com] delete repository [backup]
[2019-07-27T02:31:00,493][INFO ][o.e.r.RepositoriesService] [myserver.com] put repository [backup]
[2019-07-27T02:31:00,534][INFO ][o.e.s.SnapshotsService ] [myserver.com] snapshot [backup:snapshot/Vmr9WpUDS3CeRAXRkAJw8A] started
[2019-07-27T02:34:29,411][INFO ][o.e.s.SnapshotsService ] [myserver.com] snapshot [backup:snapshot/Vmr9WpUDS3CeRAXRkAJw8A] completed with state [SUCCESS]

but on both slaves (is this the correct term?) I'm getting this instead:

[2019-07-27T02:31:00,515][WARN ][o.e.r.VerifyNodeRepositoryAction] [myslave.com] [backup] failed to verify repository
org.elasticsearch.repositories.RepositoryVerificationException: [backup] a file written by master to the store [/var/db/elasticsearch/backup] cannot be accessed on the node [{myslave.com}{_7zxUfdrQrSDyollCyytSg}{bN_ptp5jS2mea1SFYcRs9w}{172.16.1.18}{172.16.1.18:9300}{xpack.installed=true}]. This might indicate that the store [/var/db/elasticsearch/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node

rihad · July 27, 2019, 6:50pm

Damn. Here it says that a shared filesystem needs to be used for snapshots to work. What is it for? Any way to take snapshots without setting it up?

Christian_Dahlqvist · July 27, 2019, 8:08pm

No, the snapshot is created cluster wide so ALL nodes need to have access to the same repository.

DavidTurner · July 27, 2019, 9:27pm

No it's really not. They're either data nodes (if node.master: false and node.data: true) or just "the other nodes".

You could try setting up a Minio node to emulate AWS S3, so that you can use the repository-s3 plugin.

rihad · July 28, 2019, 5:24am

Is there a simpler way? Can I set up the backup with no sharding, but only with replication? All I need is a snapshot that I can copy to an entirely other machine (staging environment) daily and restore it there. This worked when the "cluster" consisted of a single node.

At the very least, would taking the backup on the master only work?

rihad · July 28, 2019, 6:59am

How can the NFS server be set up? Should all data nodes be given read+write access to the shared directory?

rihad · July 28, 2019, 8:12am

Awesome! The nfs setup worked fine, and I could still target any server belonging to the cluster when issuing the queries, not only the master (maybe it had to be a master-eligible one, but this is irrelevant). Thanks to everyone

DavidTurner · July 28, 2019, 8:25am

No, as a rule the master does its best to avoid handling any data, but this idea would route the whole snapshot through the master. It's the primary of each shard that takes the snapshot of that shard.

Yes, you can send any request to any node in a cluster and rely on Elasticsearch to make sure it gets to the right place. In larger clusters it's often a good idea to avoid sending any requests directly to the master nodes so that they can focus on coordinating the cluster.

Glad you got it working, thanks for letting us know.

rihad · July 28, 2019, 10:11am

Thanks. I'm a bit wary of the fact that three processes on different machines have full r/w access to the same shared data directory (/var/db/elasticsearch/backup). Will they make sure not to step on each other's toes?

DavidTurner · July 28, 2019, 10:56am

Yes, the master makes sure that they all stay out of each others' way when making a snapshot. You can't have more than one cluster writing to the same snapshot repository, because there's no way to coordinate that, but it's fine if it's just a single cluster.

system · August 25, 2019, 10:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Lost es snapshots Elasticsearch	5	620	July 5, 2017
Elasticsearch snapshot backup in another server is not happening Elasticsearch	16	1581	April 20, 2020
Restoring Snapshot to another cluster Elasticsearch snapshot-and-restore	2	3503	October 13, 2021
Restore index from another cluster Elasticsearch	2	13345	March 6, 2019
Not able to restore backup on another cluster Elasticsearch	1	522	July 18, 2017

Snapshots stopped working after switching from a single server to cluster (getting the server married?)

Related topics