Remote eligible node for backup purposes?

rgoerner · April 12, 2024, 11:42am

Hi,
first, I want to thank the community for the help, assistance and ideas.

We currently run a 3 master node cluster without any problem.
I want to add a kind of 4th node, which is not running actively in the primary cluster, but will receive all the data from the 3 node main cluster. I want to use then this single node for cloning purposes.

Means: This single node should contain the whole dataset, which is now spread about 3 clusternodes. I want to do snapshoting on filesystem-level on it, then.
Is this possible, with a kind of Remote eligible node, or is my idea just silly :-)? I am not sure, because normally, data is spread across all nodes.

Thanks
Ronny

rgoerner · April 12, 2024, 11:53am

I also had the strange idea like that:

use haproxy for endpoint
send data to main cluster
send data additionally to a single cluster with a master node
This should work?

leandrojmp · April 12, 2024, 12:14pm

If you add a node to the cluster it will be part of the cluster, Elasticsearch does not have nodes that work for backup and things like that, all nodes are parte of the cluster.

What you want to do is called Cross Cluster Replication, where you replicate your data on another cluster, but it is a completely different cluster. This is a paid feature, you would need at least a paid platinum license for both clusters.

To partially replicate this with the basic license you would need to do that before ingestion, which means that you would need to send your data to both clusters.

Also, it is not clear what you want to achieve with this scenario. Wouldn't be easier and cheaper to have the snapshots directly on your main cluster or using a cloud bucket?

rgoerner · April 12, 2024, 12:21pm

@leandrojmp
I supposed that I need the platinum license for it.
But I think, the idea with send it to both clusters should work.

Snapshots are too slow, means, I need a fast way to do a low level snapshot on filesystem base. If I would replicate it to a 2 nd cluster with only one node, I could do fs based snapshots from here. For 300 GB data, this would be the very fastest way I think ?

leandrojmp · April 12, 2024, 12:28pm

Cross cluster replication needs a license for both clusters, so you would need 2 platinum licenses at least.

Keep in mind that depending on how you are indexing your data one cluster being offline can impact the other, and in some cases you cannot do that, for example if you are using the Elastic Agent with the Defend integration, it can only send data to Elasticsearch, so you can only send data to one cluster.

And how would you restore the data on your main cluster? You would use a nfs path shared between the two cluster and use one cluster just to create the snapshots?

Not sure what is the advantage of that.

This depends on many factors, I can't see what is the difference in doing the snapshots on your main cluster and using a diferent cluster for that.

Christian_Dahlqvist · April 12, 2024, 12:33pm

Elasticsearch does not support file system level snapshots, so you would not be able to bootstrap a new cluster based on this unless you perhaps add nodes to the single node cluster (which would not require any snapshot). The only supported backup methodology is the snapshot/restore API. If you have a single node cluster this could be created against the local file system but to use this you would still need to mont this and restore it to a new cluster.

Topic		Replies	Views
Remote cluster node query Elasticsearch	7	201	September 15, 2023
Adding a node from an existing cluster to another cluster Elasticsearch	7	359	February 2, 2021
Replicating all data to a single node Elasticsearch	9	1745	July 6, 2017
Questions on elasticsearch cluster data backup & restore Elasticsearch	7	470	August 26, 2018
Elasticsearch Cluster, how to create 2 data nodes single node to primary shards and the other to replica? Elasticsearch	20	1504	February 4, 2019

Remote eligible node for backup purposes?

Related topics