Weird rebalancing strategy

linkerc · September 23, 2021, 10:20pm

I am seeing several active recovery tasks where 1 shard is moving from Node A -> Node B, but at the same time for a different shard, it's moving from Node C to Node A.
Wouldn't this cause rebalancing to never end?

Why would a node be both source & destination in rebalancing?

This is sort of what I am experiencing at the moment. I added 4 new nodes to my cluster and the shard count fluctuates up and down with high CPU on those 4 new nodes. Rebalancing is taking a very long time.
Is this normal and expected?

Christian_Dahlqvist · September 25, 2021, 6:21am

Do you have any non-default settings that could affect the behaviour? Which version of Elasticsearch are you using?

linkerc · September 25, 2021, 8:32pm

7.3.1
Yes. I have increased the node and task counts like below.

PUT _cluster/settings
{
  "persistent":{
    "cluster.routing.allocation.node_concurrent_recoveries":10,
    "cluster.routing.allocation.cluster_concurrent_rebalance":60
  }
}

I make similar changes whenever I add new nodes to speed up the rebalancing. This is the first time I've seen such behavior where new nodes will never reach the shard count as others. The shard count on those new nodes just fluctuate up and down and I noticed one new node being on both the source and destination. It seems a wasted move.
This is the first time we have increased the data node size to 25. With 21 data nodes and below, I have never seen this on the same version of ES (7.3.1)

DavidTurner · September 25, 2021, 9:25pm

I think this can happen if you adjust those settings, you should revert those changes. The best way to speed up recoveries is with indices.recovery.max_bytes_per_sec. 7.3.1 is really old too, long past EOL, IIRC there were some big improvements to recovery speed in more recent versions so you should upgrade to a supported version as a matter of urgency too.

system · October 23, 2021, 9:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Version 8.10 still has recovery/rebalance issue? Elasticsearch	5	115	April 1, 2024
Shard activity always happening in ElasticSearch 8.12 version Elasticsearch	7	61	August 23, 2024
Share Rebalancing on large clusters (2.4) Elasticsearch	5	898	January 19, 2017
Shard recovery in 8.10.2 seems to happen more often Elasticsearch	14	280	January 16, 2025
ES Constantly reballancing after restart Elasticsearch	8	1549	July 5, 2017

Weird rebalancing strategy

Related topics