Concurrent updates to ElasticSearch while _reindex is in progress


(sandeep) #1

Hi team,

We have been using this link as a reference to accommodate any change in the mappings for a field in our index with zero downtime.

Question:
Considering the same example taken in the above link, when we reindex the data from
my_index_v1 to my_index_v2 using _reindex API. Does ElasticSearch guarantee that any concurrent updates happening in my_index_v1 would make it to my_index_v2 for sure?

For example, a document might get updated in my_index_v1 before or after it is reindexed by api to my_index_v2.

Ultimately, we just need to ensure that while we did not want any downtime for doing any mapping changes (hence did _reindex using alias and other cool stuff by ES), we also want to ensure that none of the add/update were missed while this huge reindex was in progress, as we are talking about reindexing >50GB data.

Thanks,
Sandeep


(Mark Walkom) #2

Just like _update_by_query, _reindex gets a snapshot of the source index

That's from the docs, which means no, it does not. It takes said snapshot and then reindexes based on that. Changes prior to the snapshot are carried across, those after the snapshot - ie when you make the request - will not be processed.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.