Shard Balancing

While reviewing our ElasticSearch cluster today I noticed that the
shards for one of the indexes didn't appear to be evenly balanced
across the nodes. After speaking with another developer, we noticed
that the total number of shards, regardless of index, per node was
roughly the same. This was surprising to me, as I would have assumed
it would have balanced an even number of shards per index per node
instead of an even number of shards per node. My concern with it
doing the later is that there isn't a guarantee that the indexes
themselves are of the same size or take the same amount of queries.
As such you could get essentially overload one box.

Take our cluster for example. At the time I noticed this it consisted
of 5 indexes (5 shards and 1 replica each) and a total of 5 nodes.
One of the indexes is roughly 40GB while three were about 10GB and the
final index about 10MB.

What I had noticed was that for the large index (and the one with the
most load), 5 of the shards were on one node instead of the 2 shards
per node that I would have assumed. On the smaller index I had even
noticed that there were 3 shards instead of 2. I then started
counting the total number of shards, regardless of index, per node,
and realized that each node had 10 shards.

I wouldn't normally think much of it if all of the indexes were the
same size but ours aren't. I'm concerned that 50% of one index is on
one node instead of being distributed evenly to five. I'm not worried
about disk space now but what I'm concerned about is the distribution
of searching. This one box would also take the majority of the search
traffic.

Another note: when I deleted some of the indexes that we no longer
needed. The other indexes started to re-balance. Another indicator
that balancing of shards per node is across all indexes and not per
index.

Currently in ES, distribution of shards among the nodes is indeed only
based on number of shards as you have noticed. Distribution algorithm does
not take the size of the shards into account. If you have indices with
significantly different size, you can have the unbalanced nodes problem
you're describing.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Jan 20, 2012 at 3:42 PM, jjasinek jjasinek@gmail.com wrote:

While reviewing our Elasticsearch cluster today I noticed that the
shards for one of the indexes didn't appear to be evenly balanced
across the nodes. After speaking with another developer, we noticed
that the total number of shards, regardless of index, per node was
roughly the same. This was surprising to me, as I would have assumed
it would have balanced an even number of shards per index per node
instead of an even number of shards per node. My concern with it
doing the later is that there isn't a guarantee that the indexes
themselves are of the same size or take the same amount of queries.
As such you could get essentially overload one box.

Take our cluster for example. At the time I noticed this it consisted
of 5 indexes (5 shards and 1 replica each) and a total of 5 nodes.
One of the indexes is roughly 40GB while three were about 10GB and the
final index about 10MB.

What I had noticed was that for the large index (and the one with the
most load), 5 of the shards were on one node instead of the 2 shards
per node that I would have assumed. On the smaller index I had even
noticed that there were 3 shards instead of 2. I then started
counting the total number of shards, regardless of index, per node,
and realized that each node had 10 shards.

I wouldn't normally think much of it if all of the indexes were the
same size but ours aren't. I'm concerned that 50% of one index is on
one node instead of being distributed evenly to five. I'm not worried
about disk space now but what I'm concerned about is the distribution
of searching. This one box would also take the majority of the search
traffic.

Another note: when I deleted some of the indexes that we no longer
needed. The other indexes started to re-balance. Another indicator
that balancing of shards per node is across all indexes and not per
index.

I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html.
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.

Mark

Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle mark@markwaddle.com wrote:

I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic.
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.

Mark

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com wrote:

I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic....
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.

Mark

You can control where indices are placed and how replicas are distributed
externally already, see more here:
Elasticsearch Platform — Find real-time answers at scale | Elastic (shard
allocation awareness and filtering).

There is a class that deals just with how to balance shards, its
called EvenShardsCountAllocator (
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/EvenShardsCountAllocator.java)
and implements ShardsAllocator. It can be easily allowed to be pluggable if
needed.

Note, the tricky bit here is to have a balancing logic that moves as little
shards as possible while still providing the best distribution.

On Mon, Jan 23, 2012 at 10:23 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index
they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster bound
to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic..
..
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could
run
individually on hosts or in sets on hosts, whichever makes sense based
on
the resources and redundancy needed.

Mark

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com wrote:

I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic....
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.

Mark

@Yooz: cool!, can you share the code? People might find it helpful.

On Mon, Jan 23, 2012 at 10:48 PM, Yooz youngmaeng@gmail.com wrote:

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index
they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here

Elasticsearch Platform — Find real-time answers at scale | Elastic....

You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.

Mark

Hi there,

@Yooz: Can you share some basic steps on how you implemented your allocator?

I'm currently facing this problem. I have an empty index that ES is giving
the same relevance as another index that has 70GB of size.

On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:

@Yooz: cool!, can you share the code? People might find it helpful.

On Mon, Jan 23, 2012 at 10:48 PM, Yooz youngmaeng@gmail.com wrote:

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here

Elasticsearch Platform — Find real-time answers at scale | Elastic....

You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.

Mark

On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:

@Yooz: cool!, can you share the code? People might find it helpful.

On Mon, Jan 23, 2012 at 10:48 PM, Yooz youngmaeng@gmail.com wrote:

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here

Elasticsearch Platform — Find real-time answers at scale | Elastic....

You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.

Mark

+1

I had the same problem where I had a tiny index causing the shards of a
large index to be balanced oddly.
I ended up setting the small indexe's replicas = shards -1, which will
force the larger index to balance
as if there were no other indexes :slight_smile:

On Mon, May 21, 2012 at 10:28 AM, MagmaRules mfcoxo@gmail.com wrote:

Hi there,

@Yooz: Can you share some basic steps on how you implemented your
allocator?

I'm currently facing this problem. I have an empty index that ES is giving
the same relevance as another index that has 70GB of size.

On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:

@Yooz: cool!, can you share the code? People might find it helpful.

On Mon, Jan 23, 2012 at 10:48 PM, Yooz youngmaeng@gmail.com wrote:

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.

Mark

On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:

@Yooz: cool!, can you share the code? People might find it helpful.

On Mon, Jan 23, 2012 at 10:48 PM, Yooz youngmaeng@gmail.com wrote:

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.

Mark

Woops, that should have been "nodes" not "shards" - 1.

On Mon, May 21, 2012 at 2:32 PM, John Cwikla cwikla@radiusintel.com wrote:

I had the same problem where I had a tiny index causing the shards of a
large index to be balanced oddly.
I ended up setting the small indexe's replicas = shards -1, which will
force the larger index to balance
as if there were no other indexes :slight_smile:

On Mon, May 21, 2012 at 10:28 AM, MagmaRules mfcoxo@gmail.com wrote:

Hi there,

@Yooz: Can you share some basic steps on how you implemented your
allocator?

I'm currently facing this problem. I have an empty index that ES is
giving the same relevance as another index that has 70GB of size.

On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:

@Yooz: cool!, can you share the code? People might find it helpful.

On Mon, Jan 23, 2012 at 10:48 PM, Yooz youngmaeng@gmail.com wrote:

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.

Mark

On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:

@Yooz: cool!, can you share the code? People might find it helpful.

On Mon, Jan 23, 2012 at 10:48 PM, Yooz youngmaeng@gmail.com wrote:

You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/

We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.

On Jan 23, 12:23 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Shay - out of curiosity - do you plan on making allocation algo
pluggable or adding alternative allocation options?

Thanks,
Otis

On Jan 23, 1:52 pm, Shay Banon kim...@gmail.com wrote:

Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.

On Sat, Jan 21, 2012 at 5:55 AM, Mark Waddle m...@markwaddle.com
wrote:

I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.

A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here
Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.

Mark