Limit concurrent Elasticsearch shard relocations


I'm running Elasticsearch 7.5.2 with Curator 5.8. The cluster consists of some SSD hot-data nodes and some HDD warm-data nodes. Curator is configured to daily move older indices from hot-data nodes to the cold ones. The problem is that IOPS are rather low on HDD nodes (obviously) and the process often results in these nodes being temporarily unresponsive (too much IOWAIT), which causes the cluster to go red, start the recovery procedure and so on. Annoying and unnecessary.

The approach I'd like to take here is to limit concurrent shard relocations / the throughput etc. in order to make the process run a bit longer (which is OK), but become less traumatizing to the HDD nodes.

The docs state that, while it may seem so, cluster.routing.allocation.cluster_concurrent_rebalance ( is not the parameter to tune (This setting does not limit shard relocations due to allocation filtering or forced awareness.). And I do not see any throughput / concurrency limited parameter in allocation filtering docs (

So the questions is - what parameters should I tune down? Thanks!

The number of concurrent recoveries of all kinds is controlled by cluster.routing.allocation.node_concurrent_recoveries. It defaults to 2 but you can reduce this to 1 if you wish.

The total bandwidth consumed by recoveries on each node is limited by indices.recovery.max_bytes_per_sec which defaults to a fairly conservative 40MBps but you can reduce this if your warm nodes can't even cope with that. Although you can set this setting cluster-wide with PUT _cluster/settings you can also set it on a node-by-node basis in their respective elasticsearch.yml files if you want faster recoveries in your hot tier.

Right. Will look into that. Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.