I have a couple of questions as below on cluster data backup & restore, thanks.
There are 3 nodes in the cluster, is it OK to use whichever node to execute the backup command as below?
PUT _snapshot/my_backup/snapshot_1
The three nodes in the elasticsearch cluster are in three different domains, and each domain has separate shared storage system. The shared volume(storage system) in each domain can only be visible to the nodes in the same domain. So what I can do is to attach a different volume to each es node in each domain. But based on the document, The shared filesystem path must be accessible from all nodes in your cluster! I can make sure each node has the same mount path, i.e. /mount/backup, but actually the backend volumes are different.
So my question is that Can I backup & restore data in such case?
The shared filesystem used for snapshots must indeed be available to all nodes, so what you are describing will unfortunately not work. You might however be able to use an S3 backed repository instead if that is allowed.
@Christian_Dahlqvist . Thanks for the answer. I am not allowed to use AWS S3 or Azure cloud. Instead, I can only use the storage service provided by our own Cloud Infrastructure. So It seems that I have to implement a plugin for elasticsearch to support this. Is it feasible?
What do you think of question 1? Actually I deployed elasticsearch in a kubernetes cluster, and I started 3 elasticsearch PODs. When I execute the backup command "PUT _snapshot/my_backup/snapshot_1" using the kubernetes service's cluster IP address, the command was actually passed to one of the backend elasticsearch POD. So only one of the elasticsearch instance can receive the backup command. I am not sure whether there are any potential problems. Can you please clarify this? Thanks.
Any node in the cluster can receive any command. Did you verify that your three nodes successfully formed a cluster, e.g. through the _cat/nodes API? Did you set minimum_master_nodes to 2 to avoid any split-brain scenarios as described here?
@Christian_Dahlqvist Yes, I think the three nodes should have already formed a cluster successfully. I recalled that each elasticsearch instance generated a log indicating joining the cluster successfully. But it's a good point to verify this using "_cat/nodes" API, I will try this later, thanks for the info.
Do you think is it feasible to implement an elasticsearch plugin to support our own storage service, just in the same way as what had been done for AWS s3 or Azure cloud?
I do not know as I have never developed a plugin. You should however be able to use one of the existing plugins as a template/example, which should make it easier than creating one from scratch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.