Migrate data to a new cluster with minimal interruption

Ror · November 7, 2025, 4:10pm

Hi,

I need to migrate multiple TB of data to a new cluster. Since this is production data, I’d like to avoid interruption as much as possible which can be tricky based on the volume of data. I have a simple infrastructure with a service that is responsible to write to elastic and multiple services that read from it. It writes to multiple data streams depending on the data. the data is only appended to the current data stream index, it never modify previous data. each data stream has a policy which makes it roll over its index every day.

I need to have minimal interruption on the services. The maximum possible would be 2 days but I’d like to keep it way below.

My plan to migrate is the following :

1 - stop the writing service so no new data is added to any index
2 - rollover every data stream so that the current index is closed
3 - take a manual snapshot
4 - run the restore procedure on the new cluster
5 - while the restore is running, restart the writing service and make it write to both clusters to add new data (no modifications to previous indexes, only new ones) → This is so that the read services can still access production data while the restore is happening.
6 - wait for the restore to finish
7 - make the elastic services only query the new cluster

Step 5 is the one I'm not sure about. I don't know if I will be able to write to the datastream that is being restored on the new cluster even though it does not write to previous indexes.

Does this plan seems ok ? If not what would be the best way to do this that does not require a licence and avoid too long interruption ?

Christian_Dahlqvist · November 7, 2025, 4:19pm

Will both clusters be running the same version of Elasticsearch or does the migration to new hardware include version change?

Is the new cluster running in the same location and /or same type of hardware?

What is the reason that drives the move to a new cluster?

What is the size and topology of the current and target clusters?

Ror · November 7, 2025, 4:38pm

The new cluster will be deployed with ECK and run the latest version available. The current cluster is running on Elastic Cloud in version 8.17.
Both clusters are in different locations since the new one will be on our Kubernetes cluster.
Both clusters will have the same size and topology at first : 4 hot nodes of 60GB memory. The goal is to reduce the size of the new cluster later to be more in phase with the resource usage.

The motivation for this migration is cost. Our company have experience in managing ECK and wants to reduce costs by moving the cluster to Kubernetes (where we can have more control on the cost of the infrastructure).

Christian_Dahlqvist · November 7, 2025, 4:50pm

The new cluster will be deployed with ECK and run the latest version available. The current cluster is running on Elastic Cloud in version 8.17.

OK, that rules out migrating through a stretched cluster.

What is the longest retention period of your data?

If this is reasonably short and you have a message queue in your ingest pipeline it might be an option to feed both clusters separately for a period of time until they hold the same data and then switch over without any downtime at all.

leandrojmp · November 7, 2025, 5:20pm

How many TB are you migrating? less then 10? 50? 100?

Since this is append only, for how long do you keep your data and do you search on old data or basically most searchs are on new data?

Also, what is the license level in the new cluster and what will be the license level on the new cluster?

DavidTurner · November 10, 2025, 5:27am

Consider going round a snapshot/restore loop repeatedly first. Snapshots & restores are both incremental operations, so although the first one will take a while the later ones will be quicker as the two clusters get more and more synchronized, until you get to the point where you can run the process you suggest but without needing to start writing in step 5 until the restore is complete.

IMO it’d be simpler to use cross-cluster search tho, leaving the existing data where it is until it ages out while new data accumulates in the new cluster. There’s probably some middle ground where you start with a cross-cluster search setup to do the main switchover and then migrate the older data across using snapshots, which you can do at a more leisurely pace since it’s not on the critical path.

Ror · November 17, 2025, 9:50am

On elastic, the max retention period is a year. We have 10 TB of data. Our ingest process, a kafka topic where we read from and then write to elastic, have a retention of 2 days.

Reading on old data happens less but still might happens sometimes (it depends on the case).

Both clusters use Basic licence (although the old one have access to paid licence features since it is on elastic cloud).

@DavidTurner I didn’t know restores where incremental too, is this by default or do I need to specify this in the restore process ? This sounds like our best option to loop over snapshots and restores then.
Cross cluster search sounds like a good idea too to avoid interruption.

DavidTurner · November 17, 2025, 11:04am

It’s the default behaviour when restoring over an index that already exists in the cluster (which must therefore be closed before performing the restore).

Topic		Replies	Views
Regarding Elasticsearch Migration between 2 clusters having same version Elasticsearch	3	411	April 26, 2019
Cluster data migration to another system Elasticsearch	3	1012	July 6, 2017
Migrating to new cluster Elasticsearch	13	1497	July 6, 2017
Migrate data to existing index in sanpshot approch in Elastic Search Elasticsearch snapshot-and-restore	2	229	December 8, 2022
Migration of ES from AWS to non-AWS cluster Elasticsearch	3	1269	February 20, 2017

Migrate data to a new cluster with minimal interruption

Related topics