Zero-Downtime Split Operation

I'm trying to figure out the best way to perform a zero-downtime split operation. I have a few questions:

  1. Is it possible for writes to go into the new index while the split is ongoing? Or should I shut down write operations until the split is finished?
  2. I've read forum posts talking about deleteByQuery being run while the split is happening, but this doesn't appear in the current documentation. Am I correct in assuming that the old source index is left intact during the split operation? IE - I can read from the source index during and after the split and can simply delete it when I'm ready.

Right now my plan is to:

  1. Upscale so that each node has less than 35% disk usage.
  2. Convert index to read-only.
  3. Trigger the split.
  4. Change the aliases to write to new index.
  5. Wait for split to complete.
  6. Update aliases to read from new index.

Steps 2-4 would all happen as simultaneously as possible. Does this seem like a good plan?

The split index API requires the index to be split be set to read-only, so you need to stop wtite operations (as shown in your list of steps).

The split operation creates a new index based on the original one so you can still read from it and will need to delete it explicitly once you are switching to the new index.

I think you need to swap steps 4 and 5 so you are sure the new index is ready before you try to start writing to it. Am not sure how long time it will take to complete these steps.

Why do you have separate aliases for reading and writing?