I've been tasked with migrating a large ES cluster which we have set up on EC2 instances in AWS, to a local cluster we want to set up. It is a once-only migration (i.e. we don't need to keep the two clusters in synch). However, there is a time-box we're aiming for as far as how long that migration would take.
From what I've read so far in the manual and in this Elasticsearch group and general Google searches, it looks like the recommended approach for migrating a cluster is to use Snapshot and Restore. Snapshot to S3, then set up the local cluster to reference the same S3 repo, and do a restore from there.
In reading the Snapshot and Restore documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html), it indicates that the index Snapshot is incremental - i.e. only files created / changed since last snapshot are copied to the repository.
What I'm wondering is either:
Can I do something like a first Snapshot a couple of days before the scheduled migration, and then restore the local cluster from that Snapshot, then at the time of migration do another Snapshot to pick up the delta changes, and then just restore those delta changes into the local cluster? i.e. do the vast majority of the data transfer before the window we have, and just do the deltas during that window?
Is there another approach that could be used to do this migration?
We're running ES 2.3.1. _cluster/stats shows 369 indexes, and a _stats/store shows total size_in_bytes for primaries to be 900,930,370,722 - so close to 1 TB. At our data centre transfer rates, that would work out to be about 15 hours to do the transfer of that much data from AWS into the data centre.