I just decommissioned a node from my cluster. I noticed there are many threads (15-20) doing the shard relocation, which is great.
After about 30 minutes, the rebalancing threads dropped down to 3, which is my cluster setting for
"node_concurrent_recoveries".
The online document states:
cluster.routing.allocation.cluster_concurrent_rebalance
Allow to control how many concurrent shard rebalances are allowed cluster wide. Defaults to
2
. Note that this setting only controls the number of concurrent shard relocations due to imbalances in the cluster. This setting does not limit shard relocations due to allocation filtering or forced awareness.
Why does the concurrent relocation threads dropped? Is this by design? It makes the decommission process longer IMO. Since "allocation filtering" doesn't have to follow the setting anyway.
I believe during the first 30 min, there are lots of activities between nodes (not just the decommissioned node). Which suggests the cluster is using backup/primary to rebalance (many to many transfer).
But after 30 minutes or so, it becomes 1 to many transfer.