Version 8.10 still has recovery/rebalance issue?

We often add many data nodes at once (say 10).
If I don't tweak the default setting, there are only 2 recovery tasks at any given moment and it'll take forever to rebalance.

If I tweak "cluster.routing.allocation.cluster_concurrent_rebalance" to 10, then I could get 10 tasks moving shards. Which will significantly shorten the rebalance time.
But I often see shards being move from one of the 10 new nodes even though it's shard count is still significantly smaller than existing old nodes (say 30 of them).
This creates an issue where leaving that setting to 10 will cause the cluster to never finish recovery/rebalance even after 24 hours. I am pretty sure the reason is shards are being moved back and forth between new and old nodes, instead of just from old nodes to new nodes.

I know the recommendation is not to tweak that value, but moving 2 shards at a time is too slow.
Has anybody figure out the best approach to add more data nodes without such weird bug?


Actually it is happening even with default setting of recovery of 2 shards.
Just got a situation where:

es-data-033. -> es-data-027
es-data-017 -> es-data-033

So data-033 is both the source and destination of recovery. Both data-027 & data-033 are newly added nodes.

data-033 has 80 shards
data-017 has 101 shards
data-027 has 77 shards


There is no change to it, if you change from the default you still can have this issue of shards moving around constantly.

If I'm not wrong it is related to this open issue.

There is another github issue about improving it here.

Thank you very much for the links.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.