How to rebalance primary shards on elastic cluster

(oraclept) #1

Hi - How can I rebalance primary shards on elasticsearch cluster?
Below are primary shards allocation on 5 node elastic cluster. We are seeing huge spikes on node4

     6 node1
     38 node2
    116 node3
    162 node4
     35  node5

Replica shards

169 node1
136 node2
 59 node3
 12 node4
140 node5

When we look at all shards allocation on cluster nodes it looks balanced.

175 node1
174 node2
175 node3
174 node4
175 node5

I modified below it did not do anything.

cluster.routing.rebalance.enable to primaries
cluster.routing.allocation.allow_rebalance to always.

I want to balance shards so that all nodes gets equal number of primary shards if possible on index level. We have around 60 indicies and each index having varying shard numbers.

(Andres) #2

++ Expanding this question, with our use case, where we justify why we also consider that primary sahrd rebalancing is needed.

In other threads this has been answered as not being necessary saying that primary balancing wont matter that much as primary and replica shards should be carrying equivalents loads.

But building over @oraclept question, we have a use case where this is not true, and primaries are doing a much harder work, so if primaries arenot balanced some nodes can be completely overloaded while others are idle.

Consider for instance a heavy index scenario, where indexing involves upserting (and sadly upserting cant be avoided in our case).

As part of indexing, primary shards have to query the data and do the upsert logic, and later propagate the changes to the replicas.
Replicas dont need to execute the upserting logic, resulting in that primaries work is much heavier than replicas, so that the workload on the cluster is as unbalanced as the primary shards are.

So, we also wonder ifs it possible to have primaries balanced on a cluster, automatically or executing some kind of procedure...?

(David Turner) #3

This question does come up occasionally, and you're right that in most cases it's not necessary. However, it is also true that an update-heavy workload will consume more resources on the primary than on a replica (and conversely this is the only kind of workload that would cause such an imbalance).

There is no simple way to do primary balancing. The allocator doesn't really distinguish between primaries and replicas, and there is no API to promote a specific replica to a primary. You can try using the cluster reroute API to move a primary somewhere else, or to cancel the allocation of a primary (which will then promote a replica) but it doesn't cover all the cases.

I can't see an issue in Github specifically requesting this feature. Could you open one? There is no simple way to achieve this with today's allocator, but it'd be good to start to quantify the support for this feature there.

(Andres) #4

Thanks for the quick response,

Sure, I will open a issue in Github, as a new feature request and will also try to push it through our supported subscriptions.

The cluster reroute API workaround is exactly what we are using in some clusters (moving replicas and a primaries in the same reroute transaction, to a node with less primaries). But we have found 2 main drawbacks that makes this solution insufficient:

  1. Its a resource heavy operation, as shard data has to be moved arround.
  2. We need to have spare node that won't host any shard from the affected index (we cannot swap a replica and a primary between 2 nodes without using an intermediate node).