Using snapshots for replication

eaa · August 3, 2020, 8:27am

Hi everyone,

Say I can't afford the license required for the CCR feature (https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html) but have two Elasticsearch dbs and need to replicate one into another - would you say creating snapshots from active and restoring them on passive every hour is a decent method?

If not, do you know other ways to create replications without the proper feature?

I considered using Logstash to duplicate all incoming events into the passive ES, but that method can easily go out of sync and only include changes from one source of data, which is not enough.

Any ideas or suggestions would be appreciated, currently I'm leaning for the snapshots-restore approach.

Thanks,

Christian_Dahlqvist · August 3, 2020, 8:31am

I would recommend adding a message queue and have two Logstash instances write to a cluster each. They can get out of sync if there are issues, but restoring a snapshot hourly will always be out of sync.

Steve_Mushero · August 3, 2020, 10:57am

But aren't these a different type of out of sync? A lagging snapshot just lags by some hourly or whatever window, but otherwise should have exact data matches, but two queues, processors, etc. can get you out of sync in ways that can't ever be fixed, e.g. some data got dropped or errored or circuit-breakered, etc. and there's no way to re-sync them (though some errors will retry), other than a snapshot, so back to option 1.

But the lag may be significant, like an hour or 30 min and you have to prune aggressively so the snaps / deletes don't get slower (easier in V7.8 with wildcard snap delete).

Christian_Dahlqvist · August 3, 2020, 11:08am

They are different types of out of sync and the different solutions suffer from different failure scenarios.

eaa · August 3, 2020, 12:00pm

Hi @Steve_Mushero, @Christian_Dahlqvist, thank you for the replies.
One hour delay is OK for us in this case, so if the question is between constant 1-hour out of sync or potential unknown out of sync due to temporary problems which we'll need to merge / fix manually or with sophisticated scripts, I prefer the more predictable approach.

I'll purge snapshots as soon as I consume them.

Steve_Mushero · August 3, 2020, 12:20pm

Just make SURE the 2nd cluster registers up the repo in read-only mode or bad things may happen.

ALSO note that if you fully purge all the time, you'll have big / slow snaps, as they are incremental and if you purge all, you have to write them all again, and again, and again.

So better to keep a rolling set, even just one, so the non-changing data is written only once, etc., e.g. if you're doing hourly, purge snaps that are a few hours old, but never purge them all or you need to write out all the data again. Depending on your shard count, you might even keep 24 and purge > 1 day old, etc.

Also, if interested, I just did a blog on how snapshots work (hopefully not too many errors):

system · August 31, 2020, 12:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster-to-Cluster replication using polling Elasticsearch	1	815	July 5, 2017
ElasticSearch sync two clusters Elasticsearch	4	435	March 11, 2020
Elasticsearch data replication from one server to another server Elasticsearch	6	1165	August 1, 2020
Replication to DR site, without CCR Elasticsearch ccr-cross-cluster-replication , slm-snapshot-lifecycle-management	4	528	December 2, 2020
Snapshots not showing in a timely manner Elasticsearch snapshot-and-restore	9	1355	December 9, 2021

Using snapshots for replication

Related topics