Snapshots stopped working after switching from a single server to cluster (getting the server married?)

Simple as that. I'm attempting to take snapshots from the same server as before, which isn't the master anymore. path.repo: ["/var/db/elasticsearch"]
Snapshot directory is /var/db/elasticsearch/backup.
For some reason its index-NNN files stopped updating on the day of the switch, judging by FS file last modification time. But other data files seem to be updated. I should note that on other two servers of the cluster backup directory also exists, but has other index-NNN files which started from 1 and going up every day by 1. Why is the backup directory part of the cluster at all? Would moving it outside of repo.path fix the problem?

I should note that even though I still keep addressing the previous master (now slave) in my API requests (using curl), the log entries regarding backup snapshot creation etc. magically appear in the master log files
[2019-07-27T02:31:00,454][INFO ][o.e.s.SnapshotsService ] [] snapshot [backup:snapshot/W3FHurPGSa6MkxFDcRHDKQ] deleted
[2019-07-27T02:31:00,466][INFO ][o.e.r.RepositoriesService] [] delete repository [backup]
[2019-07-27T02:31:00,493][INFO ][o.e.r.RepositoriesService] [] put repository [backup]
[2019-07-27T02:31:00,534][INFO ][o.e.s.SnapshotsService ] [] snapshot [backup:snapshot/Vmr9WpUDS3CeRAXRkAJw8A] started
[2019-07-27T02:34:29,411][INFO ][o.e.s.SnapshotsService ] [] snapshot [backup:snapshot/Vmr9WpUDS3CeRAXRkAJw8A] completed with state [SUCCESS]

but on both slaves (is this the correct term?) I'm getting this instead:

[2019-07-27T02:31:00,515][WARN ][o.e.r.VerifyNodeRepositoryAction] [] [backup] failed to verify repository
org.elasticsearch.repositories.RepositoryVerificationException: [backup] a file written by master to the store [/var/db/elasticsearch/backup] cannot be accessed on the node [{}{_7zxUfdrQrSDyollCyytSg}{bN_ptp5jS2mea1SFYcRs9w}{}{}{xpack.installed=true}]. This might indicate that the store [/var/db/elasticsearch/backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node

Damn. Here it says that a shared filesystem needs to be used for snapshots to work. What is it for? Any way to take snapshots without setting it up?

No, the snapshot is created cluster wide so ALL nodes need to have access to the same repository.

1 Like

No it's really not. They're either data nodes (if node.master: false and true) or just "the other nodes".

You could try setting up a Minio node to emulate AWS S3, so that you can use the repository-s3 plugin.

1 Like

Is there a simpler way? Can I set up the backup with no sharding, but only with replication? All I need is a snapshot that I can copy to an entirely other machine (staging environment) daily and restore it there. This worked when the "cluster" consisted of a single node.

At the very least, would taking the backup on the master only work?

How can the NFS server be set up? Should all data nodes be given read+write access to the shared directory?

Awesome! The nfs setup worked fine, and I could still target any server belonging to the cluster when issuing the queries, not only the master (maybe it had to be a master-eligible one, but this is irrelevant). Thanks to everyone

No, as a rule the master does its best to avoid handling any data, but this idea would route the whole snapshot through the master. It's the primary of each shard that takes the snapshot of that shard.

Yes, you can send any request to any node in a cluster and rely on Elasticsearch to make sure it gets to the right place. In larger clusters it's often a good idea to avoid sending any requests directly to the master nodes so that they can focus on coordinating the cluster.

Glad you got it working, thanks for letting us know.

1 Like

Thanks. I'm a bit wary of the fact that three processes on different machines have full r/w access to the same shared data directory (/var/db/elasticsearch/backup). Will they make sure not to step on each other's toes?

Yes, the master makes sure that they all stay out of each others' way when making a snapshot. You can't have more than one cluster writing to the same snapshot repository, because there's no way to coordinate that, but it's fine if it's just a single cluster.

1 Like