Version 8.10 still has recovery/rebalance issue?

linkerc · March 1, 2024, 12:36am

We often add many data nodes at once (say 10).
If I don't tweak the default setting, there are only 2 recovery tasks at any given moment and it'll take forever to rebalance.

If I tweak "cluster.routing.allocation.cluster_concurrent_rebalance" to 10, then I could get 10 tasks moving shards. Which will significantly shorten the rebalance time.
But I often see shards being move from one of the 10 new nodes even though it's shard count is still significantly smaller than existing old nodes (say 30 of them).
This creates an issue where leaving that setting to 10 will cause the cluster to never finish recovery/rebalance even after 24 hours. I am pretty sure the reason is shards are being moved back and forth between new and old nodes, instead of just from old nodes to new nodes.

I know the recommendation is not to tweak that value, but moving 2 shards at a time is too slow.
Has anybody figure out the best approach to add more data nodes without such weird bug?

Thanks.

linkerc · March 1, 2024, 12:54am

Update:
Actually it is happening even with default setting of recovery of 2 shards.
Just got a situation where:

es-data-033. -> es-data-027
es-data-017 -> es-data-033

So data-033 is both the source and destination of recovery. Both data-027 & data-033 are newly added nodes.

data-033 has 80 shards
data-017 has 101 shards
data-027 has 77 shards

linkerc · March 4, 2024, 9:36pm

nobody?

leandrojmp · March 4, 2024, 9:55pm

There is no change to it, if you change from the default you still can have this issue of shards moving around constantly.

If I'm not wrong it is related to this open issue.

There is another github issue about improving it here.

linkerc · March 4, 2024, 10:12pm

Thank you very much for the links.

system · April 1, 2024, 10:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Weird rebalancing strategy Elasticsearch	4	358	October 23, 2021
Shard rebalancing on Elasticsearch 1.7.1 Elasticsearch	5	1440	July 5, 2017
Share Rebalancing on large clusters (2.4) Elasticsearch	5	927	January 19, 2017
Unexpected rebalancing behavior Elasticsearch	4	412	July 6, 2017
ES Constantly reballancing after restart Elasticsearch	8	1590	July 5, 2017

Version 8.10 still has recovery/rebalance issue?

Related topics