The es-cluster warm-node rebalance tasks keeps going and some node shards decrease for long time

version :7.17
The es-cluster warm-node rebalance tasks keeps going and not stop, warm node 's index comes from hot-node by ILM policy control, I didn't update warm index directly. I hava snapshot task, but I think it's not the cause of rebalane keeps goling.
I tried reduce or enlarge the cluster_concurrent_rebalance and wait but the reblanace is still didn't stop.
Thre wired thing is the warm-20 and warm-15's disk space and shard keep decrease for 8 hours , snapshot bellow:

I open the allocator trace log, found that warm-20 as move source node , the index weight is more than 30(it's not resonable), but warm-20 as move target node is negative or less then 5, I calculate myself the weight 30+ mabe wrong. the warm-20's shards is very less compire other's warm node. I think bad weight value is the cause of rebalancing issue. so what is the reason behind this? see below logging snapshot.
node_id "WvAX8WYTQiGIYdFdosEBjw" is the name of "warm-20"


code : /elasticsearch-7.17.3-sources.jar!/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java:552
private void balanceByWeights() { ...."Balancing from node [{}] weight: [{}] to node [{}] weight: [{}] delta: [{}]",

my cluster setting
GET /_cluster/settings?include_default
"persistent" : {
"cluster.routing.allocation.allow_rebalance" : "indices_all_active",
"cluster.routing.allocation.balance.threshold" : "3",
"cluster.routing.allocation.cluster_concurrent_rebalance" : "2",
"cluster.routing.allocation.enable" : "all"
},
"transient" : {
"cluster.routing.allocation.balance.threshold" : "2.0",
"cluster.routing.allocation.cluster_concurrent_rebalance" : "24",
"cluster.routing.allocation.disk.watermark.flood_stage" : "99%",
"cluster.routing.allocation.disk.watermark.high" : "94%",
"cluster.routing.allocation.disk.watermark.low" : "94%",
....
current allocation info
GET _cat/allocation?v=true&&s=shards&h=node,shards,disk.*
node shards disk.indices disk.used disk.avail disk.total disk.percent
logging-hot-11 150 1.1tb 1.2tb 492gb 1.7tb 72
logging-hot-12 151 1.2tb 1.3tb 337.6gb 1.7tb 80
logging-hot-8 151 1.2tb 1.3tb 400.8gb 1.7tb 77
logging-hot-19 151 1.1tb 1.1tb 533.1gb 1.7tb 69
logging-hot-15 151 1tb 1.1tb 583.6gb 1.7tb 66
logging-hot-18 151 973.3gb 1tb 691.6gb 1.7tb 60
logging-hot-1 152 1.2tb 1.3tb 381.1gb 1.7tb 78
logging-hot-9 152 1009.1gb 1tb 657.4gb 1.7tb 62
logging-hot-5 152 1.1tb 1.1tb 533.6gb 1.7tb 69
logging-hot-17 152 1019.2gb 1tb 646.2gb 1.7tb 63
logging-hot-2 152 1.1tb 1.1tb 534gb 1.7tb 69
logging-hot-0 152 1.1tb 1.2tb 513.6gb 1.7tb 70
logging-hot-7 152 984gb 1tb 680.6gb 1.7tb 61
logging-hot-14 153 1.1tb 1.1tb 533.6gb 1.7tb 69
logging-hot-3 153 1.1tb 1.2tb 529.7gb 1.7tb 69
logging-hot-16 153 1tb 1.1tb 619.1gb 1.7tb 64
logging-hot-6 154 1.1tb 1.2tb 503.5gb 1.7tb 71
logging-hot-10 154 1.1tb 1.1tb 531.8gb 1.7tb 69
logging-hot-4 154 1tb 1.1tb 575.2gb 1.7tb 67
logging-hot-13 154 1tb 1.1tb 608.9gb 1.7tb 65
logging-warm-20 165 1.7tb 2.1tb 4.9tb 7tb 29
logging-warm-15 205 3tb 3.3tb 3.7tb 7tb 47
logging-warm-8 292 5.5tb 5.9tb 1.1tb 7tb 83
logging-warm-2 294 5.2tb 5.6tb 1.4tb 7tb 79
logging-warm-12 305 5.6tb 5.9tb 1.1tb 7tb 84
logging-warm-4 306 5.7tb 6tb 1003.8gb 7tb 86
logging-warm-22 307 5.6tb 6tb 1tb 7tb 85
logging-warm-9 310 5.4tb 5.8tb 1.2tb 7tb 82
logging-warm-7 311 5.8tb 6.2tb 879.2gb 7tb 87
logging-warm-23 311 5.8tb 6.2tb 882.8gb 7tb 87
logging-warm-14 312 5.7tb 6.1tb 992.5gb 7tb 86
logging-warm-10 315 5.3tb 5.7tb 1.2tb 7tb 81
logging-warm-18 315 5.4tb 5.8tb 1.2tb 7tb 82
logging-warm-11 315 5.4tb 5.8tb 1.2tb 7tb 82
logging-warm-6 316 5.2tb 5.5tb 1.4tb 7tb 79
logging-warm-16 317 5.1tb 5.4tb 1.5tb 7tb 77
logging-warm-17 317 5.2tb 5.6tb 1.4tb 7tb 79
logging-warm-3 317 5.6tb 5.9tb 1.1tb 7tb 84
logging-warm-19 318 4.5tb 4.8tb 2.1tb 7tb 69
logging-warm-0 318 4.7tb 5.1tb 1.9tb 7tb 72
logging-warm-21 318 5.4tb 5.7tb 1.2tb 7tb 82
logging-warm-5 319 5.2tb 5.6tb 1.4tb 7tb 80
logging-warm-13 320 5.5tb 5.9tb 1.1tb 7tb 83
logging-warm-1 321 5.6tb 5.9tb 1tb 7tb 84
warm-disk-usage trend

I calculate warm-20 index weight myself, it may not right. it's as a refrence should be negative not 30+ as snapshot show.

args:  node_id=None, node=logging-warm-20, index=phpback_2022-05-12
----------------------
公式:   weightindex=  indexBalance * (shards_of_node_index_count - avg_shards_per_node_index  =  0.55 * (2- 1.043)  = 0.526
公式:   weightShard=  shardBalance * (node_shards_count- avg_shards_per_node) =  0.45 * (168- 223.652 = -25.043)
    统计 node: logging-warm-20, index: phpback_2022-05-12,  node_index_weight: -24.517 

Strangely I have just been investigating another case that looks very similar and found some strange effects when too much concurrent balancing is allowed. I opened #87279 with some more details but the short answer is "remove cluster.routing.allocation.cluster_concurrent_rebalance from your config".

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.