Hi Team,
I want to reindex the multiple indexes that have a lot of data in them. To reduce CPU usage, I am using the Reindex API POST _reindex
with the parameter request_per_second
. But during the reindexing, if some newer data comes then it can be managed by alias, but what about the updated data how can i transfer or reflect the updated data to newer index after the reindexing.Will elastic manage the updated data that can be transferred to the newer index?
I had one approach, but I don't know if it would work during reindexing if I paused the data and resumed it after reindexing.
Is there any better approach that will not cause data loss, with zero downtime, but with minimal delay?
I would appreciate your advice on this matter.
No. The data that exists at the time the reindexing job is initiated is what will be reindexed.
Pausing updates during the reindexing and then switching to the new indides is the safest way but does result in potentially long downtime.
If you have (or added) a last updated timestamp on the documents (e.g. through an ingest pipeline that sets it to when the document actually reached Elasticsearch) you may be able to perform an initial reindexing of the bulk of the data. Once that is complete you could then stop ingestion and run a separate reindexing job to catch all updates based on the update timestamp before switching over to the new index. This would potentially reduce downtime but may not catch deletes, so could lead to some inconsistencies. Might be worth testing though.
I don't want to stop ingestion of data, as it is a never-ending process.Is there any other way rather than way?
I think you need to stop it at some point if you do not want to risk losing data, but you can limit the downtime by copying over data in advance so you only have a limited amount of data that need to reindexed during the window.