Difference between 'rebalance.enable' and 'allocation.enable'


#1

I was looking for a setting to stop the big old shards that contain data from moving, when a new node is added.
I found on this page https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html
two setting 'cluster.routing.allocation.enable' and 'cluster.routing.rebalance.enable'

I set allocation to new_primaries and sure, the old shard stopped moving around.
I naively assumed that allocation is for moving shards and rebalance is for moving data between shards, but when I tried setting rebalance to 'none' the shards stopped moving too.

I tried reading this https://www.elastic.co/blog/every-shard-deserves-a-home
but it seems to use rebalancing as a synonym to reallocation.

So my questions are:

1)What is the difference between allocation.enable setting and rebalance.enable setting.
2)What would be a better setting to set to avoid moving old shards when an empty node comes online (but maybe not when a watermark.high is reached?)
3)Can the data in the shards of a single index become unbalanced (one shard is much bigger than another) and would ES try to fix it, or does it only move shards around?


Shard rebalancing vs shard allocation
(Steve Crickett) #2

I'm not sure if the behaviour has been tweaked since the blog post was written. The documentation in the latest release does go into a lot of detail.

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/shards-allocation.html#_shard_rebalancing_settings

This is my understanding of it.
Allocation sets how un-allocated shards are treated and when it is allowed to allocate them to a node (i.e. increasing total consumed storage by creating new shards ), whereas re-balance is when ES attempts to balance (shard count) data around the node by relocating existing shards from one node to another.

When a new empty node joins the cluster, ES is going to want to move shards to it so that it has an equal share of the shards. Disabling re-balance would prevent this, but it would also prevent moving data in the event of hitting a watermark too, AFAIK.

Can shards in a single index become unbalanced? I'm not sure. I haven't seen that happen in my clusters, but then I send similarly sized documents into my clusters. I guess if you have lots of different size documents, it could happen that more large documents end up in one shard over another. At a guess I think segment merges will keep the shard sizes the same, but will have a different number of documents stored in them relative to the other shards. Perhaps someone for ES can elaborate on that one.

Steve


#3

So, what you are saying is that reallocation and rebalancing are the same. That's the conclusion I came to myself too. But even though the setting is called allocation.enable, new_primaries affects reallocation too. Hence, my question, of whether there is any difference.

I just checked and new_primaries seems to disallow allocation of replicas in addition to all reallocation and rebalancing none only disallows all reallocation, so I guess that's the difference.

For watermark my guess is that you would have to enable reallocation manually when the watermark is reached. Still haven't tested.

Seems like putting an index out of balance is easy. Created two nodes, put a shard on each, put one of the nodes offline, add some documents, put the node online, and voila! Does not seem like any rebalancing inside the index is happening, though that could have something to do with its size.

Thank you for your answers!


(Steve Crickett) #4

Not quite.. subtly different. One is the creation of new shards within the cluster, while the other is the distribution of existing shards around the cluster, I think.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.