Migration of ES from AWS to non-AWS cluster


(rpsandiford) #1

HI, all.

I've been tasked with migrating a large ES cluster which we have set up on EC2 instances in AWS, to a local cluster we want to set up. It is a once-only migration (i.e. we don't need to keep the two clusters in synch). However, there is a time-box we're aiming for as far as how long that migration would take.

From what I've read so far in the manual and in this Elasticsearch group and general Google searches, it looks like the recommended approach for migrating a cluster is to use Snapshot and Restore. Snapshot to S3, then set up the local cluster to reference the same S3 repo, and do a restore from there.

In reading the Snapshot and Restore documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html), it indicates that the index Snapshot is incremental - i.e. only files created / changed since last snapshot are copied to the repository.

What I'm wondering is either:
Can I do something like a first Snapshot a couple of days before the scheduled migration, and then restore the local cluster from that Snapshot, then at the time of migration do another Snapshot to pick up the delta changes, and then just restore those delta changes into the local cluster? i.e. do the vast majority of the data transfer before the window we have, and just do the deltas during that window?

Or:
Is there another approach that could be used to do this migration?

We're running ES 2.3.1. _cluster/stats shows 369 indexes, and a _stats/store shows total size_in_bytes for primaries to be 900,930,370,722 - so close to 1 TB. At our data centre transfer rates, that would work out to be about 15 hours to do the transfer of that much data from AWS into the data centre.

Thanks!

Bob Sandiford


(Mark Walkom) #2

Your snapshot idea would work.

Otherwise look at reindex from remote - https://www.elastic.co/guide/en/elasticsearch/reference/5.1/docs-reindex.html#reindex-from-remote


(rpsandiford) #3

Did some prototyping. Did a snapshot of an existing cluster. Then added a new index and a few documents to it, and did another snapshot. Then did some messing with that new index (deleted, changed, and added docs) and did a third snapshot.

Then did a restore to the first snapshot, then the second, and then the third, checking what was there for each step.

This would be good to perhaps make explicit in the snapshot/restore documentation, that you can do repeated snapshots / restores for a migration. i.e. to the 'big' one in advance and restore it into the new cluster, then a final delta at the time of the migration and restore that final snapshot into the new cluster.

Thanks!

Bob.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.