I'm running Elasticsearch 7.5.2 with Curator 5.8. The cluster consists of some SSD hot-data nodes and some HDD warm-data nodes. Curator is configured to daily move older indices from hot-data nodes to the cold ones. The problem is that IOPS are rather low on HDD nodes (obviously) and the process often results in these nodes being temporarily unresponsive (too much IOWAIT), which causes the cluster to go red, start the recovery procedure and so on. Annoying and unnecessary.
The approach I'd like to take here is to limit concurrent shard relocations / the throughput etc. in order to make the process run a bit longer (which is OK), but become less traumatizing to the HDD nodes.
The docs state that, while it may seem so,
cluster.routing.allocation.cluster_concurrent_rebalance (https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html) is not the parameter to tune (
This setting does not limit shard relocations due to allocation filtering or forced awareness.). And I do not see any throughput / concurrency limited parameter in allocation filtering docs (https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-filtering.html).
So the questions is - what parameters should I tune down? Thanks!