Editing and re-indexing large amounts of data in elasticsearch (millions of records)

I recently made a new version of an index for my Elasticsearch data with some new fields included. I re-indexed from the old index, so the new index has all of the old data with also the new mapping to include the new fields.

Now, I'd like to update all of my Elasticsearch data in the index to include these new fields, which I can calculate by making some separate database + api calls to other sources.

What is the best way to do this, given that there are millions of records in the index?

Logistically speaking I'm not sure how to accomplish this... as in how can I keep track of the records that I've updated? I've been reading about the scroll api, but not certain if this is valid because of the max scroll time of 24 hours (what if it takes longer than that)? Also a serious consideration is that since I need to make other database calls to calculate the new field values, I don't want to hammer that database for too long in a single session.

Would there be some way to run an update for say 10 minutes every night, but keep track of what records have been updated/need updating?

I'm just not sure about a lot of this, would appreciate any insights or other ideas on how to go about it.

Have you thought perhaps temporarily loading those other sources into Elasticsearch as well then use a reindex with a combination of ingest pipeline with enrich processor?

Not sure how complex your new data is...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.