Shard recovery in 8.10.2 seems to happen more often

linkerc · April 8, 2024, 10:53pm

I have noticed that in version 8.10.2, shard allocation seems to prioritize on storage space first then the shard delta.
Which is fine. I don't really have a preference either way.
But as the result, shard moving seems to be happening more often now.
By default, the shard count delta between data nodes are larger in version 8.10.2.
So I constantly see recovery tasks running.
This is not the case for version 7.15. Older version is more strict on maintaining shard count difference. So I can set "cluster.routing.allocation.balance.threshold" to say 3 and the delta will remain within 3 between data nodes. I rarely see shard recovery (unless I delete indices).

But with version 8.10.2, I always see recovery. Is this expected?

leandrojmp · April 9, 2024, 4:18am

Which version were you running before?

Did you change any of the default settings regarding rebalance?

cluster.routing.allocation.cluster_concurrent_rebalance
cluster.routing.allocation.node_concurrent_incoming_recoveries
cluster.routing.allocation.node_concurrent_outgoing_recoveries
cluster.routing.allocation.node_concurrent_recoveries

linkerc · April 9, 2024, 5:55pm

7.15
For 7.15, we change the following:

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.balance.threshold": 3,
    "cluster.routing.allocation.cluster_concurrent_rebalance": null,
    "cluster.routing.allocation.node_concurrent_recoveries": null,
    "cluster.routing.allocation.node_concurrent_incoming_recoveries":null,
    "cluster.routing.allocation.node_concurrent_outgoing_recoveries":null
  }
}

So shard delta between data nodes stayed typically <= 3.

linkerc · April 9, 2024, 8:21pm

Version 8.10.2 seems to be in a perpetual cycle of shard rebalancing.
After setting "cluster.routing.allocation.cluster_concurrent_rebalance" to 30 and the rebalancing jobs jumped from 2 to 30 immediately.

I'll keep it at 30 to see if it's simply too many for the default of 2 to catch up...

But fundamentally seems to be the issue of too many shard movements. The cluster doesn't settle to a state.
We do create/delete some indices hourly & daily, but version 7.15 suggests our pattern of usage was not an issue for older algorithm.

leandrojmp · April 9, 2024, 8:43pm

You should use the default, there is an open issue about this, increasing this value can impact in the rebalance.

Check this issue: Increasing `cluster.routing.allocation.cluster_concurrent_rebalance` causes redundant shard movements · Issue #87279 · elastic/elasticsearch · GitHub

And also this one tracking a possible fix: Throttle recoveries on data nodes instead of master · Issue #98087 · elastic/elasticsearch · GitHub

There was also a change on version 8.6 that changed some things regarding balancing.

I had some issues when I upgraded to 8.8.1 about shards that kept moving around, as mentioned in this topic.

linkerc · April 9, 2024, 8:54pm

Ok. Thanks.

linkerc · April 9, 2024, 9:07pm

I think part of the reason contributing to what I'm seeing is the hourly index creation/deletion in our use case with the new algo.

GET _internal/desired_balance
all of a sudden shows 10 nodes to be false on the hour.

I guess the new introduction to balancing primary on storage rendered shard count unimportant, yet it triggers rebalance.
It becomes too jittery..

linkerc · April 9, 2024, 10:23pm

After reset all cluster settings to default. I just got 40 "node_is_desired": false.

DavidTurner · April 10, 2024, 6:57am

This typically means you should increase cluster.routing.allocation.balance.threshold. Looks like you currently have it set at 3, but you can reasonably go much higher than that.

linkerc · April 10, 2024, 7:55pm

ok. I'll give that a try.
Just to mention that if I set it to 3, it's never settle below delta of 3. I attributed to prioritizing on storage with 8.10.2.
Even with delta above the threshold, I do see the cluster settle for period of times. Meaning no rebalancing occuring, which I interpreted that threshold is no longer useful.

I'll give it a larger value to see if it's less jittery.

DavidTurner · April 10, 2024, 8:23pm

Yeah 3 no longer means "shard counts balanced within ±3", it's balancing a mix of shard count, disk space and write load. That means one way to balance the cluster would be to put lots of tiny shards on one node and the few remaining enormous shards on the other node.

linkerc · April 10, 2024, 8:45pm

thanks for the clarification.
Does this mean the default of 1000 shards per node is no longer enforced?

And setting that value to 10 seems to settle the 8.10.2 cluster. Much less rebalancing.

DavidTurner · April 10, 2024, 9:21pm

No, the average number of shards per (non-frozen) node must remain below 1000 by default.

mukularora89 · August 21, 2024, 3:51pm

Hi @linkerc what value did you set for cluster.routing.allocation.balance.threshold and it still does shard movements. Also does it keeping your cluster in unbalanced state?

Topic		Replies	Views
Shard activity always happening in ElasticSearch 8.12 version Elasticsearch	7	47	August 23, 2024
Version 8.10 still has recovery/rebalance issue? Elasticsearch	5	109	April 1, 2024
Weird rebalancing strategy Elasticsearch	4	323	October 23, 2021
ES Constantly reballancing after restart Elasticsearch	8	1542	July 5, 2017
Shard rebalancing on Elasticsearch 1.7.1 Elasticsearch	5	1409	July 5, 2017

Shard recovery in 8.10.2 seems to happen more often

Related topics