We upgraded ES from 2 to 5 and want to do a full reindex so that our indices are upgradable to ES6.
Given clusters running on AWS D2 nodes each with hundreds of +100GB indices configured to have 1 primary and 0 replicas, what would be an optimal reindexing strategy?
Nodes don't have a lot of extra disk space (~80% full).
What we're currently thinking is reindexing/deleting 1 index at a time with wait_for_completion=true but initial tests show that this takes a long time. We're seeing average throughput of 4.5MBps.
Would it make sense to drop wait_for_completion=true and let the cluster parallelize reindex tasks? Would the cluster retry reindexes that failed due to a temporary lack of disk space?
Does it parallelize wait_for_completion=false reindex requests without specifying slicing?
What happens when reindexing from indices which are still being written to? Would the destination index only get documents that were in the source at the point in time when the reindex request was sent?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.