Hi guys!
I'm reindexing a lot of data. To beef up the performance and also have more "elastic" error-handling, I've divided every source index into parts (ranges for a field). I then parallelize the process, by reindexing a number of ranges at a time. This has also the benefit of ranges being isolated, and so, if a range fails, I can investigate it and restart.
In Elasticsearch 5.1 we have reindexing with sliced scrolls. I have two questions:
- How does slicing compare with the approach above (manually specifying ranges) in terms of performance?
- How does slicing compare with the approach above (manually specifying ranges) in terms of error-handling? i.e. if there is a problem with a single slice, can I simply restart it, with the rest of documents still being reindexed?
Thanks in advance!
Haris