Best way to load initial data to new node?

Hello everyone,

I am wondering how to correctly load initial data to a new node.

Say I have 1 master node and I want to add 2 data nodes.
My master node is continuously indexed, but I have a backup made with elasticsearch curator available on amazon S3. Say the data is 10 minutes behind master data.

When launching my new data nodes, should I just make them join the master node cluster, then they will automatically get data from the master ?

Or is it possible to restore the data from S3 on the 2 new data nodes, then make them join the cluster. At this time will they catch up to the data on the master ? What if some files which were backup on S3 have been deleted on master node, will they be deleted on data nodes too ?
I am interested in this 2nd solution as it may be faster if the nodes are distant from my master node.

Thank you in advance


Don't do this.

We strongly discourage clusters spread over diverse geographic locations.

Thank you very much for your input, really appreciate.