How to reindex using date field in source index?

I have a source index with 14 million docs (2.5 TB). Reindexing it all at once is destabilizing the cluster and it stops midway. there is no way of tracking the progress and resuming . So i would like to know how to reindex using a certain date field. For example, i would like to reindex from year 1995 to 1998 in one go and then 1998 to 2001 in another.So that if the process stops at any time , I would be aware of where it stopped and where to resume it from.

someone talks about something like that in the above topic.

The reindex API supports a query part. See https://www.elastic.co/guide/en/elasticsearch/reference/7.2/docs-reindex.html

So if you have a date field, you can use a range query to select only documents between 2 dates.

Yeah. That seems good. Our index is pretty big. Can i run multiple queries at once . Like one query per year. And how to make sure we got all our documents?

I don't know if you can run multiple reindex tasks at the same time.
To compare what you have in both indices, just count the number of documents in both indices.

FYI. We just tried that. It works. We can run multiple reindex tasks at a time.
On another note, may i know how "conflicts": "proceed", works in reindex API.

Documentation of reindex API is not clear about it. What it actually treats as conflicts ? etc

If you meant conflicts when several jobs are running in parallel, my guess is that the last change overwrites any previous changes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.