haw many shards are you recovering per node? Have you tried increasing cluster.routing.allocation.node_concurrent_recoveries to increase the level of papallelism as described here?
We have total 32 shared in our index, and 10 nodes. So I can see that 3 shards are going on one node (apart from 2 nodes, where there are 4 shared).
We did try to set cluster.routing.allocation.node_concurrent_recoveries to 4 on the cluster (Updated elasticsearch.yml file on all nodes + restarted service on all), although that didn't make any difference on the restoration performance.
We also tried the below cluster configs (Just to see if it impacts speed):
Update:
After trying multiple options, we weren't able to find the root-cause of this.
We decided to use an updated version of ES (version: 7.10.2).
We are now able to reach 33gbps speed with the updated cluster.
Intrestingly, we are using similar configs (similar ansible playbooks) for both the version, but with version 7.10.2, we were able to reach the target speed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.