I, too, would like my primary shards to be balanced across the cluster.
I often reduce the number of replicas on older indices to optimize disk space.
When the cluster is reasonably idle, shards will recover from just-maked-for-deletion copies.
But quite often this means hours of indices being relocated to nodes that just deleted them.
Search engine landed me here:
Best explanation of problems is here: how-to-rebalance-primary-shards-on-elastic-cluster/176060/4
The requirement for rebalancing was also asked about in
primary-shard-rebalancing/6173 and cluster-reroute-automatically-reroute-and-rebalance-all-primary-shards/161757
Is this the proper way to inject solutions into older topics?
Primary purpose of this topic was to link existing topics to the github issue - no solution yet:
05:09PM - 25 Apr 19 UTC
high hanging fruit
** Please read the guidelines below. **
Issues that do not follow the
… se guidelines are likely to be closed.
1. GitHub is reserved for bug reports and feature requests. The best place to
ask a general question is at the Elastic [forums](https://discuss.elastic.co).
GitHub is not the place for general questions.
2. Is this bug report or feature request for a supported OS? If not, it
is likely to be closed. See https://www.elastic.co/support/matrix#show_os
3. Please fill out EITHER the feature request block or the bug report block
below, and delete the other block.
# Primary shard balancing
There uses cases where we should be able to have a mechanism to balance primary shards through all the nodes so the number of primaries is uniformly distributed.
## Problem description
There are situations where the distribution of primary shards is unevenly distributed through the nodes, for instance when doing a rolling restart last node wont have any primary shards as the other nodes would assume primary shard role while the other node is down.
This issue has pop up in other occassions, and the usual answer was that primary/replica role is not an issue becouse the workload a primary or a replica assume is similar. But there are important uses cases where this does not apply.
For instance, in an index heavy scenario, where indexing must be implemented as an scripted upsert, the execution of the upsert logic falls onto primarie shards, and replicas just have to insert the result.
In this cases having unbalanced primaries excerts a bigger workload on the nodes hosting the primaries, this can even overload the cluster capacity as the cluster bottleneck will be the capacity of the nodes hosting primaries and not the sum of the cluster nodes.
[Related thread in official forum](https://discuss.elastic.co/t/how-to-rebalance-primary-shards-on-elastic-cluster/176060/3)
Actually there are some workarounds for this situations, but they are not efficient:
1. Once cluster primaries are unbalanced we could use the [Cluster reroute API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html) to try to balance them, swapping, in a "reroute transaction", a replica with a primary. In order to do this, first we need to have more nodes in the cluster than replicas because shards cannot be rerouted to a node where a shard already exists.
> As an example,consider simplified scenario 3 nodes 2 shard 3 replicas, where rerouting is not possible:
> Node 0: Shard 0 (primary), Shard 1 (primary)
> Node 1: Shard 0 (replica), Shard 1 (replica)
> Node 2: Shard 0 (replica), Shard 1 (replica)
> *Rerouting cannot be possible, we cannot swap shard 0 (primary) from Node 0 to Node 1 and shard0 (replica)*
> Against a scenario like 3 nodes 3 shard 2 replicas where rerouting is possible:
> Node 0: Shard 0 (primary), Shard 1 (primary)
> Node 1: Shard 0 (replica), Shard 2 (primary)
> Node 2: Shard 1 (replica), Shard 2 (replica)
But even, when reroute is possible it means that shard data has to be moved from one node to another (I/O and network...).
Also there is not an automatted way to detect what primaries are unbalanced, and what shards can be swapped and execute that rerouting in samlls chunks in order to dont overload node resources. But implementing a utility or script that does that is feasible (see possible solutions).
2. Simply "throw a bag of hardware" to the problem, having enough hardware to support an unbalanced scenario. Then we can limit the number of shards per node (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/allocation-total-shards.html), both primaries + replicas, so they are distributed between nodes. This apporach is unecessary expensive, and impractical at certain scale levels.
## Possible solutions? (all of them imply new features to be implemented)
So, lets consider possible solutions (take into account that I don't know elasticsearch internals):
1. Enhance Cluster reroute API so that you can "reroute" the role of a shard: lets say we reroute a primary shard from a node to another node that hosts a replica shard, the data is not moved between the nodes, but replica shard is elected as primary and primary as replica.
If reroute API has this functionality somehow, it would be possible to develop a script that detects primary shard imbalance and reroute primary hard roles accordingly.
2. Modify cluster shard allocation protocol, so that primaries are automatically balanced when shards are assigned to the nodes. This could be active by default, or optional (configuring something new cluster settings, under cluster.routing...)
3. Any other ideas ¿?
If it is currently not possible to demote a primary into a replica, would it be possible to bring up the replacement replica before failing the primary? Would that avoid the degradation of redundancy? I mean that it would work like ES does index clone - hardlinking the segments intead of copying.
May 24, 2022, 8:13am
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.