Shard numbers no longer equal (not even close) among cluster nodes

Hello, I've been recently upgraded an Elasticsearch cluster, with 5 nodes, from version 7.3 to 7.17 then 8.9.
As always, I've never disabled shard allocation and rebalancing, so until 7.17 it's observed, and as I understand, that the shard numbers among the cluster nodes should be equal or close to equal if not possible to be exactly equal.
But when I upgrade to 8.9, it seems that it's no longer that way. The 5 nodes have very different number of shards after automatic shard allocation is finished (by observing if there are any relocating shards, and checking GET /_cat/recovery?active_only=true which returns empty results) — The shard number of each node respectively: 112, 145, 132, 138, 142. The difference is not something like between 10 to 200, but still it's not even close to "equal".
So what happens? Is it that Elasticsearch now has an improved strategy to allocate and rebalance shards among the cluster, so it's not necessary to be equal?
I've also tried restarting the cluster, which didn't work.

The Shard balancing heuristics was changed on version 8.6 to also consider the disk usage of the shards while rebalancing them.

You should not expect your nodes to have an equal number of shards anymore.

That's great! Thank you.

According to Shard rebalancing settings:

A cluster is balanced when it has an equal number of shards on each node, with all nodes needing equal resources, without having a concentration of shards from any index on any node.

I'm not sure if it's a problem with my understanding of the language or something, but reading that part of documentation made me think that the cluster with such varying shard numbers (112, 145, 132, 138, 142) was far from balanced.
But then as you pointed out, it feels like this way of balancing is more ideal, although one cannot realize if the cluster has finished rebalancing by looking at the shard numbers among all nodes.

The key phrase (added to the docs in 8.6) is here:

If your cluster contains shards with varying resource needs then Elasticsearch must find a compromise between equalizing the shard count and balancing the resources.

That's quite reasonable. Thanks!

Specifically have a look at this portion of the docs you referenced:

The weight of a node depends on the number of shards it holds and on the total estimated resource usage of those shards expressed in terms of the size of the shard on disk and the number of threads needed to support write traffic to the shard.

Great, thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.