Migrate from co-location to the cloud without downtime(?)

We want to move out from co-lo (Sweden), and in to AWS EC2 in USA (i3.2xl) (maintaining it our-self, not the service one)

We have a 18 data node setup on co-location. We are running elasticsearch 5.6 We have two replicas enabled. The nodes are using rack attribute rack1,2,3 (6 nodes per rack) and rackawarness is enabled (every copy of a shard lives in a different rack.)

Our idea for migration, is to setup a "new rack" in AWS. More specificly,

  • add another replica on the indicies (nothing will happen since we have 3 racks only and there is no place to put them)
  • Setup VPN between our co-lo and the VPC at AWS
  • create 6 data nodes and give them the rack attribute "rack4" in AWS
  • join cluster.. the third replica should be initiated...

Does this approach sound safe?

We are using "prefer rack" on all our queries so our applications should still target rack1-3 (for the most).

What I'm worried about is the connectivity. there is a 120ms in latency in average. But worse, what happens when the VPN connection breaks? If its down for a few hours.. will it safely recover when its back?

Other risks?

Welcome! I'd probably look at cross cluster replication instead. See https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html

Thanks David!
The CCR seems great for this purpose but would require us to first migrate to version 6.7 (I think) but more important, it would require us to opt in for the platinum package

Can you describe you use case in some detail? Are you just indexing new data or also updating documents?

sure,
we index roughly about 100k items per hour and are doing even more updates. Probably 500k or more.

The index size is currently 4TB (without replica) and spread across 24 shards.
Since the shard size is bigger then we want, we are planning to reindex into several indices with different shard count depending on tenant.

I was considering the possibility to feed two separate clusters in parallel, but with a high portion of updates that is difficult. As David pointed out CCR is the best solution, and I can not really see any other simple workaround that does not require downtime or a potentially large amount of engineering effort.

Splittng a cluster across data centres that far a apart is not recommended and even if you can contain querying to one size of the cluster it will impact indexing as well as stats gathering. An unreliable connection could naturally also cause stability problems as well.

Thanks @Christian_Dahlqvist

Would you think the "splitting a cluster across data center" would be more reliable if we go for AWS in EU? We get a latency around 50ms against Ireland.

Note that we are not going to query the AWS environment at all, until the old environment is teared down

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.