I have ES 2.3.3 running with 6 data nodes. Each index has one replica. ONE of the data nodes was down for a day. Now it's back to life and ES started relocating shards to it.
The thing is that it only relocates two shards at a time. Each nodes holds about 1TB of data and it looks like it will take many hours. How can I increase this number to speed the process up?
P.S. I've also set indices.recovery.max_bytes_per_sec to 200mb, though I see that java process on the recovering node writes only 70-80MB/s (and I've tested my disks to provide 200+ Mb/s).
Have a look at the allocation decider cluster.routing.allocation.node_concurrent_recoveries:
How many concurrent shard recoveries are allowed to happen on a node. Defaults to 2.
Changed to 10. My cluster settings now look like the below. To test this, I evicted one node by excluding it through cluster.routing.allocation.exclude._ip. However still there were only 2 relocating shards in ES at a time.
I think this is because the settings you've mentioned relate to recovery, while what I'm experiencing is shard relocation. I.e. I'm joining new node to the cluster (to scale out) and only two shards at a time are being moved to it.
So my original question still stands - how to boost shard relocation speed?
On the page I linked, you can find the setting cluster.routing.allocation.cluster_concurrent_rebalance
Allow to control how many concurrent shard rebalances are allowed cluster wide. Defaults to 2.
Relocations are considered as recoveries as well, so you should still increase cluster.routing.allocation.node_concurrent_recoveries if all relocations go to one node.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.