Hi,
first, I want to thank the community for the help, assistance and ideas.
We currently run a 3 master node cluster without any problem.
I want to add a kind of 4th node, which is not running actively in the primary cluster, but will receive all the data from the 3 node main cluster. I want to use then this single node for cloning purposes.
Means: This single node should contain the whole dataset, which is now spread about 3 clusternodes. I want to do snapshoting on filesystem-level on it, then.
Is this possible, with a kind of Remote eligible node, or is my idea just silly :-)? I am not sure, because normally, data is spread across all nodes.
If you add a node to the cluster it will be part of the cluster, Elasticsearch does not have nodes that work for backup and things like that, all nodes are parte of the cluster.
What you want to do is called Cross Cluster Replication, where you replicate your data on another cluster, but it is a completely different cluster. This is a paid feature, you would need at least a paid platinum license for both clusters.
To partially replicate this with the basic license you would need to do that before ingestion, which means that you would need to send your data to both clusters.
Also, it is not clear what you want to achieve with this scenario. Wouldn't be easier and cheaper to have the snapshots directly on your main cluster or using a cloud bucket?
@leandrojmp
I supposed that I need the platinum license for it.
But I think, the idea with send it to both clusters should work.
Snapshots are too slow, means, I need a fast way to do a low level snapshot on filesystem base. If I would replicate it to a 2 nd cluster with only one node, I could do fs based snapshots from here. For 300 GB data, this would be the very fastest way I think ?
Cross cluster replication needs a license for both clusters, so you would need 2 platinum licenses at least.
Keep in mind that depending on how you are indexing your data one cluster being offline can impact the other, and in some cases you cannot do that, for example if you are using the Elastic Agent with the Defend integration, it can only send data to Elasticsearch, so you can only send data to one cluster.
And how would you restore the data on your main cluster? You would use a nfs path shared between the two cluster and use one cluster just to create the snapshots?
Not sure what is the advantage of that.
This depends on many factors, I can't see what is the difference in doing the snapshots on your main cluster and using a diferent cluster for that.
Elasticsearch does not support file system level snapshots, so you would not be able to bootstrap a new cluster based on this unless you perhaps add nodes to the single node cluster (which would not require any snapshot). The only supported backup methodology is the snapshot/restore API. If you have a single node cluster this could be created against the local file system but to use this you would still need to mont this and restore it to a new cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.