Improving performance of reindex API?

Higher batch size is usually better. The default batch size in 5.0 and 2.4 will be 1000 which is fairly reasonable.

Another thing you can do is set the refresh_interval to -1 on the destination index. That usually makes indexing much more efficient. Just do this before the reindex. You can set refresh=true in the reindex request and it'll trigger a refresh when it is done. Don't forget to reset refresh_interval back to something else after you finish with the reindex.

You can also slice the reindex into multiple concurrent requests with something like a range query. Like, issue 10 concurrent reindexes each with non-overlapping range queries.

1 Like