Limit concurrent Elasticsearch shard relocations

rsyne · February 18, 2020, 2:57pm

Hello.

I'm running Elasticsearch 7.5.2 with Curator 5.8. The cluster consists of some SSD hot-data nodes and some HDD warm-data nodes. Curator is configured to daily move older indices from hot-data nodes to the cold ones. The problem is that IOPS are rather low on HDD nodes (obviously) and the process often results in these nodes being temporarily unresponsive (too much IOWAIT), which causes the cluster to go red, start the recovery procedure and so on. Annoying and unnecessary.

The approach I'd like to take here is to limit concurrent shard relocations / the throughput etc. in order to make the process run a bit longer (which is OK), but become less traumatizing to the HDD nodes.

The docs state that, while it may seem so, cluster.routing.allocation.cluster_concurrent_rebalance (https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html) is not the parameter to tune (This setting does not limit shard relocations due to allocation filtering or forced awareness.). And I do not see any throughput / concurrency limited parameter in allocation filtering docs (https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-filtering.html).

So the questions is - what parameters should I tune down? Thanks!

DavidTurner · February 18, 2020, 3:49pm

The number of concurrent recoveries of all kinds is controlled by cluster.routing.allocation.node_concurrent_recoveries. It defaults to 2 but you can reduce this to 1 if you wish.

The total bandwidth consumed by recoveries on each node is limited by indices.recovery.max_bytes_per_sec which defaults to a fairly conservative 40MBps but you can reduce this if your warm nodes can't even cope with that. Although you can set this setting cluster-wide with PUT _cluster/settings you can also set it on a node-by-node basis in their respective elasticsearch.yml files if you want faster recoveries in your hot tier.

rsyne · February 19, 2020, 2:24pm

Right. Will look into that. Thanks!

system · March 18, 2020, 2:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Curator and disable shard allocation Elasticsearch	3	523	July 6, 2017
Graceful shard management? Elasticsearch	6	332	March 18, 2021
Weird rebalancing strategy Elasticsearch	4	323	October 23, 2021
Shards Taking a Long Time to Move Between Nodes - Cloud [7.1.1] Elasticsearch	50	3998	July 29, 2019
Limiting traffic speed Elasticsearch	4	1537	July 17, 2021

Limit concurrent Elasticsearch shard relocations

Related topics