[Reindexing] Is there any drawback to writing to a new index while re-indexing?

Chris_Gatihi · September 14, 2018, 11:30pm

I'm working on a project where we are considering two approaches for zero-downtime re-indexing:

Create new index and begin re-indexing data into it. While re-indexing, new indexing requests get written to the new index and the old index and searches are run against an alias pointing to old index. Once re-indexing is complete, switch the search alias to point to the new index.
Create new index and temporary write index and begin re-indexing data into the former. While re-indexing, new indexing requests get written to the temporary write index and searches are run against an alias pointing to the old index and the temporary write index. Once re-indexing is complete, writes start happening to the new index, temporary index is copied over, and search alias is switched to point to the new index.

Approach #2 definitely seems more complex. The only reason we're considering it is because we're not sure if there might be any edge case race conditions if we attempt to write to the new index while re-indexing (approach #1). With approach #1, it looks like the new index is writeable while re-indexing is happening but is there any potential for conflict if we do write to it before the re-indexing finishes? Specifically, could there be problems in the case of an update (while re-indexing is happening) to a search document that already exists in the original index?

jaddison · September 15, 2018, 3:25am

One thing to consider is the possibility that your mapping may radically change between the old and the new. You might add a new field, modify an existing field, remove a field. Application code that relies on your indexes storing documents in a specific way (either the new or the old index, depending on your approach could be 'broken' while you wait for the alias switch.

Much of this depends on how your application logic changes - how you approach this problem one time might not fit the next time.

One way to mitigate some of the effects mentioned is to run parallel application stacks during the reindex - a heavy-handed approach!

In the projects where I handle this, I (currently) accept the fact that nothing's perfect, and sometimes I may have downtime, depending on the level of application logic change. Most of the time, it's zero downtime though.

system · October 13, 2018, 3:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Zero Downtime Reindexing Elasticsearch	9	3302	July 6, 2017
Zero Downtime Reindex in both a Read and Write Heavy Environment Elasticsearch	1	916	June 26, 2019
Reindex while writing to index Elasticsearch	1	682	March 16, 2018
Reindexing with zero downtime - update document Elasticsearch	6	411	May 27, 2024
Reindex and concurrent writings in the source index Elasticsearch ilm-index-lifecycle-management	1	393	December 5, 2019

[Reindexing] Is there any drawback to writing to a new index while re-indexing?

Related topics