We have a multi-node ES 5.2.2 cluster running on Centos 6.x.
We want to register a shared filesystem repository to take a snapshot.
However due to some infra constraints, we are unable to provide a single volume of the required size to be presented as a single shared filesystem to all the nodes. We are able to provide 2 (or more) shared filesystems to all the nodes with the combined size being sufficient to hold the snapshot.
Does the filesystem repository configuration support more than one shared filesystem, so that ES will distribute the snapshot data across all the shared filesystems?
I read in the documentation that path.repo can be configured with multiple folders, but it wasn't clear whether the above is supported.
You can register more multiple shared filesystem but each one needs to be assigned to a single filesystem repository, meaning that a filesystem repository can only have a single shared filesystem.
Moreover, Elasticsearch won't automatically distribute the snapshots between multiple filesystem repositories. Each snapshot can go to a single filesystem repository (so a single shared filesystem) and any logic that distributes snapshots between repositories must be implemented in the client.
Would the following therefore work in order to backup all our data and be able to restore it?
Divide the indices into sets such that the expected snapshot size of the set fits into a shared filesystem. Let's assume they are set 1 - index1, index2 and set 2 - index3, index4
Create filesystem repository #1 linked with sharedfolder1 and perform a snapshot of index1 and index2
Create filesystem repository #2 linked with sharedfolder2 and perform a snapshot of index3 and index4
In order to restore the above, we would do the same - index1, 2 can be restored only from repository #1 and index 3, 4 can be restored only from repository #2.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.