Best way to load initial data to new node?

Xavier_TROMP · March 17, 2016, 11:06am

Hello everyone,

I am wondering how to correctly load initial data to a new node.

Say I have 1 master node and I want to add 2 data nodes.
My master node is continuously indexed, but I have a backup made with elasticsearch curator available on amazon S3. Say the data is 10 minutes behind master data.

When launching my new data nodes, should I just make them join the master node cluster, then they will automatically get data from the master ?

Or is it possible to restore the data from S3 on the 2 new data nodes, then make them join the cluster. At this time will they catch up to the data on the master ? What if some files which were backup on S3 have been deleted on master node, will they be deleted on data nodes too ?
I am interested in this 2nd solution as it may be faster if the nodes are distant from my master node.

Thank you in advance

warkolm · March 18, 2016, 4:59am

Yes.

Don't do this.

We strongly discourage clusters spread over diverse geographic locations.

Xavier_TROMP · March 18, 2016, 8:25am

Thank you very much for your input, really appreciate.