Fwd: Handling live updates while reindexing data

Prannoy_Mittal · August 29, 2016, 4:31pm

Hi ,

I have gone through link
https://www.elastic.co/blog/changing-mapping-with-zero-downtime for
reindexing with zero downtime.

There is no information for handling live updates going on into old index.

Solution thought were:

Queue updated and deletes and update new index with these instruction
when reindexing is done.
(Issue i can see with this as it does not update old index, my current
search queries will not be up to date).
Keep performing live updates on old index as well as keep queueing. When
i am done with reindex, reissue queues command to new index.
(Issue in this , there can be data inconsistencies).
I can't use old index for reindex into new index as my old document did
not contain some new fields. I always will be needing to reindex from
source of truth (sql). Again as this sql db is getting updated at a high
rate, how i can reindex to new elasticsearch index?

It will be really helpful if i can get some pointers.

Thanks in advance.
Prannoy Mittal.

nik9000 · August 29, 2016, 5:18pm

In the past I've implemented option 2 and option 1. It honestly depends on what your users expect and can handle. In my case a little bit of going back in time wasn't a big deal if it was corrected in a few seconds and usually that is how long it took.

If you have an external source of truth you might want to have a look at Elasticsearch's version_type=external semantics. If you set up whatever system is syncing the truth source into Elasticsearch to sync to both the new and the old index and always send the version and version_type=external then it'll ignore updates that'd downgrade the document. You'd have to handle deletes because reindex gets a snapshot of the data at a point in time and won't notice deletes done to the source index.

If the external source of truth is fast enough then you can just rebuild the whole index from it instead. In my case it was several orders of magnitude faster and less resource intensive to rebuild from Elasticsearch itself but if you can get away with being able to refresh from the source of truth then it is probably worth it.

Prannoy_Mittal · September 1, 2016, 6:11am

thanks @nik9000..using external type is really cool but in my case data fed into ES is combined from multiple tables in relational dbs into a single nested objects. Using last updated time of one table(least recently updated table) can lead to data inconsistencies in case of simultaneous partial updates of es document.

Topic		Replies	Views
Reindexing production environment handling live updates? Elasticsearch	4	2778	July 5, 2017
Zero Downtime Reindex in both a Read and Write Heavy Environment Elasticsearch	1	916	June 26, 2019
Re-indexing a live index Elasticsearch	11	2744	February 26, 2017
Deletions in reindex Elasticsearch reindex	2	519	April 11, 2022
Reindex Approach with zero downtime but with minimum delay Elasticsearch	3	260	September 11, 2024

Fwd: Handling live updates while reindexing data

Related topics