Migrate from co-location to the cloud without downtime(?)

jimmie · January 28, 2020, 11:01am

We want to move out from co-lo (Sweden), and in to AWS EC2 in USA (i3.2xl) (maintaining it our-self, not the service one)

We have a 18 data node setup on co-location. We are running elasticsearch 5.6 We have two replicas enabled. The nodes are using rack attribute rack1,2,3 (6 nodes per rack) and rackawarness is enabled (every copy of a shard lives in a different rack.)

Our idea for migration, is to setup a "new rack" in AWS. More specificly,

add another replica on the indicies (nothing will happen since we have 3 racks only and there is no place to put them)
Setup VPN between our co-lo and the VPC at AWS
create 6 data nodes and give them the rack attribute "rack4" in AWS
join cluster.. the third replica should be initiated...

Does this approach sound safe?

We are using "prefer rack" on all our queries so our applications should still target rack1-3 (for the most).

What I'm worried about is the connectivity. there is a 120ms in latency in average. But worse, what happens when the VPN connection breaks? If its down for a few hours.. will it safely recover when its back?

Other risks?

dadoonet · January 28, 2020, 11:14am

Welcome! I'd probably look at cross cluster replication instead. See https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html

jimmie · January 28, 2020, 1:22pm

Thanks David!
The CCR seems great for this purpose but would require us to first migrate to version 6.7 (I think) but more important, it would require us to opt in for the platinum package

Christian_Dahlqvist · January 28, 2020, 1:27pm

Can you describe you use case in some detail? Are you just indexing new data or also updating documents?

jimmie · January 28, 2020, 3:05pm

sure,
we index roughly about 100k items per hour and are doing even more updates. Probably 500k or more.

The index size is currently 4TB (without replica) and spread across 24 shards.
Since the shard size is bigger then we want, we are planning to reindex into several indices with different shard count depending on tenant.

Christian_Dahlqvist · January 29, 2020, 5:56am

I was considering the possibility to feed two separate clusters in parallel, but with a high portion of updates that is difficult. As David pointed out CCR is the best solution, and I can not really see any other simple workaround that does not require downtime or a potentially large amount of engineering effort.

Splittng a cluster across data centres that far a apart is not recommended and even if you can contain querying to one size of the cluster it will impact indexing as well as stats gathering. An unreliable connection could naturally also cause stability problems as well.

jimmie · January 29, 2020, 1:29pm

Thanks @Christian_Dahlqvist

Would you think the "splitting a cluster across data center" would be more reliable if we go for AWS in EU? We get a latency around 50ms against Ireland.

Note that we are not going to query the AWS environment at all, until the old environment is teared down

system · February 26, 2020, 1:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data(Cluster) migration in seperate networks Elasticsearch	4	407	July 6, 2017
AWS ElasticSearch Service - Live-Live Replication across Regions Elasticsearch	6	1503	July 2, 2020
ElasticSearch on AWS - Disaster Recovery? Elasticsearch	7	2639	March 7, 2018
Data migration between elastic clusters Elasticsearch migration	4	542	September 14, 2021
How to keep 2 clusters in sync Elasticsearch	6	6136	February 25, 2021

Migrate from co-location to the cloud without downtime(?)

Related topics