I am considering CCR as a tool to migrate all the data (including historical data) from a source Elasticsearch single-node cluster to a target Elasticsearch multi-node cluster (both clusters have 7.17.7 version).
The existing cluster was created 3 years ago and now it has ~1TB of data and 20+ indices. All indices have index.soft_deletes.enabled = true.
After reading the documentation of CCR, it looks like it relies on shard history retention.
My question is that if I start creating follower indices now, will CCR replicate all the data from the source cluster, or only new data will be replicated that will be ingested after the follower indexes are created?
If all data will be replicated, is there any chance that any of the records will be lost?
if not, should I create new indices with index.soft_deletes.retention_lease.period equals, for example, to 30/90 days, perform reindexing, and then use new indices as leader indices for CCR?
The main issue with reindexing is that some of the indexes have over 1B of the documents, however the maintenance window is very short (max 4 hours). It also means that we will have to reindex all 20 indices in less than 4 hours, otherwise the application will not be able to operate.
My plan was to setup CCR replication and when all the data will be replicated, perform the cutover.
Yes, we will have a license on both clusters, however, I hope that the replication will not take more than 1 month, so we will be able to decommission the old cluster quickly after that.
@leandrojmp, @Christian_Dahlqvist, @warkolm, Do you know if this approach is going to work at all or do I need to start looking into other options (reindexing or custom solutions)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.