stop all the write operations and set the num of replica to 20;
The question is that the replicas can only be recovered from the primary shard on the step 3, which is limited by the network and CPU of the data node where the primary is allocated.
I wonder why replicas can not be recovered from another replicas while the replicas are stared.
And there are no write operations on the index.
There is no theoretical reason why we couldn't optimise the recovery process to fetch files from replicas as well in theory and doing so has been discussed and may be implemented at some point. Implementing this isn't entirely trivial as it will add additional complexity to the recovery state machine which is one reason why this hasn't been worked on yet.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.