Hi All,
I was curious if anyone knows if the guidance for the cluster.routing.allocation.*
settings (ex: cluster.routing.allocation.node_concurrent_recoveries
& cluster.routing.allocation.node_initial_primaries_recoveries
) still accurate?
The reason I ask, is because of this phrase:
Increasing this setting may cause shard movements to have a performance impact on other activity in your cluster, but may not make shard movements complete noticeably sooner. We do not recommend adjusting this setting from its default of X.
I was recently doing a rolling restart of a large cluster (each hot node has ~1.6k shards), and it was on average taking ~1 hour for the node to recover. After messing with the settings a bit:
- cluster.routing.allocation.node_concurrent_recoveries:
2
->4
, then4
->6
- cluster.routing.allocation.node_initial_primaries_recoveries:
4
->8
, then8
->10
I noticed a fairly "linear" increase in recovery speed, going from ~1h
-> ~30m
, then ~30m
-> ~20m
.
So, I'm a bit curious, with all of the recent improvements to Elasticsearch, is this guidance still accurate? Does anyone else adjust these settings?
For context, I'm currently on 8.16.2