I have several batch pipelines that produce a new version of the result each time. The data can't have any downtime so I can't just delete the index before writing. The current solution is to create a new index with the run date each time and point an alias to the latest index after writing, letting ILM take care of the old indices.
There are two main problems here:
- If the pipeline stops for long enough there may be data donwtime if ILM deletes the pipeline before the next run. This I am planning to solve by programmatically deleting the previous index after updating the alias.
- If the index under the alias changes mid read (for example when paginating the results) the data becomes inconsistent as the first query will be against the old index and after alias switch the "next" pages will be from the new index, which has different data.
Is there a better way to do this? I couldn't think of a way to address the second problem that isn't prohibitively complicated to implement for the benefit it brings.