How can I speed up a long running reindex operation ?
This is being done from a source index of around 4.2TB with 16 shards each of around 300GB~ in a 10 data nodes clusters.
Target index is 90 shards. I've set # of replicas to 0 and refresh rate to -1 to try to speed things up. BUT at this point it has only indexed 1GB in the last 3 hours, which is very slow.
This is the solution. Thanks! It helped indeed. I also merged the source index in a single segment as I don't expect any further writes to it anytime soon. Also disabled all type of shard allocation throughout the cluster and now my reindex is avg ~15,000 docs/sec which is the best historical indexing rate I've ever had in this cluster
A disclaimer here, since my reindex operations take too long, I would not recommend anybody to disable allocations at cluster level if there new indices being created in the cluster (it would cause red state)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.