Determine when clone index has completed

Hi,
I'm writing some functionality where step 1 would be to backup the current ES index. For this I'm using the _cloneIndex request. This seems like the most appropriate choice. But how would I know for sure when that has finished? Is it enough to check if the cluster status is yellow or green?

Would it be smarter to use _reindex instead to create a copy of my index? I just realised the _clone request requires the index to be in block-write mode, which will cause a problem with all new incoming index requests while cloning.

The cloning process can be monitored with the _cat recovery API, or the cluster health API can be used to wait until all primary shards have been allocated by setting the wait_for_status parameter to yellow .

The _clone API returns as soon as the target index has been added to the cluster state, before any shards have been allocated. At this point, all shards are in the state unassigned . If, for any reason, the target index can’t be allocated, its primary shard will remain unassigned until it can be allocated on that node.

Once the primary shard is allocated, it moves to state initializing , and the clone process begins. When the clone operation completes, the shard will become active . At that point, Elasticsearch will try to allocate any replicas and may decide to relocate the primary shard to another node.

And Yes , To clone an index the index must be marked as read-only and have a cluster health status of green.
The documentation is pretty good and has all details you need. please refer:

  1. Clone API
  2. Reindex API

you can check the prerequisites section for both of the options.

Hi Dinesh,
Thanks for your reply and links to documentation. One thing I was wondering is the following scenario. Let's say I have a big index that I want to backup into another index. I assume this will take a short while to create the new index and create all the documents there as well. What happens to new documents that are indexed during that period? If clone requires read-only, I assume these new index requests will fail. Should I be using reindex instead?

@madshov ,

yes, you can go for reindex api , but even with this, the suggested approach would be to stop the ingestion if possible during the process.

if that's not a possibility then you might have to take care of delta updates in the new index.

You can create an alias if possible. The benefit of using the alias is that we can avoid downtime and easily roll back the migration if there is something wrong with the new index. That’s because just switching the alias can be completed quickly.

Hi @DineshNaik
Thanks for you advice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.