While reviewing our ElasticSearch cluster today I noticed that the
shards for one of the indexes didn't appear to be evenly balanced
across the nodes. After speaking with another developer, we noticed
that the total number of shards, regardless of index, per node was
roughly the same. This was surprising to me, as I would have assumed
it would have balanced an even number of shards per index per node
instead of an even number of shards per node. My concern with it
doing the later is that there isn't a guarantee that the indexes
themselves are of the same size or take the same amount of queries.
As such you could get essentially overload one box.
Take our cluster for example. At the time I noticed this it consisted
of 5 indexes (5 shards and 1 replica each) and a total of 5 nodes.
One of the indexes is roughly 40GB while three were about 10GB and the
final index about 10MB.
What I had noticed was that for the large index (and the one with the
most load), 5 of the shards were on one node instead of the 2 shards
per node that I would have assumed. On the smaller index I had even
noticed that there were 3 shards instead of 2. I then started
counting the total number of shards, regardless of index, per node,
and realized that each node had 10 shards.
I wouldn't normally think much of it if all of the indexes were the
same size but ours aren't. I'm concerned that 50% of one index is on
one node instead of being distributed evenly to five. I'm not worried
about disk space now but what I'm concerned about is the distribution
of searching. This one box would also take the majority of the search
traffic.
Another note: when I deleted some of the indexes that we no longer
needed. The other indexes started to re-balance. Another indicator
that balancing of shards per node is across all indexes and not per
index.
Currently in ES, distribution of shards among the nodes is indeed only
based on number of shards as you have noticed. Distribution algorithm does
not take the size of the shards into account. If you have indices with
significantly different size, you can have the unbalanced nodes problem
you're describing.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
While reviewing our Elasticsearch cluster today I noticed that the
shards for one of the indexes didn't appear to be evenly balanced
across the nodes. After speaking with another developer, we noticed
that the total number of shards, regardless of index, per node was
roughly the same. This was surprising to me, as I would have assumed
it would have balanced an even number of shards per index per node
instead of an even number of shards per node. My concern with it
doing the later is that there isn't a guarantee that the indexes
themselves are of the same size or take the same amount of queries.
As such you could get essentially overload one box.
Take our cluster for example. At the time I noticed this it consisted
of 5 indexes (5 shards and 1 replica each) and a total of 5 nodes.
One of the indexes is roughly 40GB while three were about 10GB and the
final index about 10MB.
What I had noticed was that for the large index (and the one with the
most load), 5 of the shards were on one node instead of the 2 shards
per node that I would have assumed. On the smaller index I had even
noticed that there were 3 shards instead of 2. I then started
counting the total number of shards, regardless of index, per node,
and realized that each node had 10 shards.
I wouldn't normally think much of it if all of the indexes were the
same size but ours aren't. I'm concerned that 50% of one index is on
one node instead of being distributed evenly to five. I'm not worried
about disk space now but what I'm concerned about is the distribution
of searching. This one box would also take the majority of the search
traffic.
Another note: when I deleted some of the indexes that we no longer
needed. The other indexes started to re-balance. Another indicator
that balancing of shards per node is across all indexes and not per
index.
I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here http://www.elasticsearch.org/guide/reference/index-modules/allocation.html.
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.
Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index they
belong to.
I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic.
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.
Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index they
belong to.
I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic....
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.
Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index
they
belong to.
I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster bound
to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic..
..
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could
run
individually on hosts or in sets on hosts, whichever makes sense based
on
the resources and redundancy needed.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index they
belong to.
I have only been working with ES for a few weeks now, so take my advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster bound to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic....
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes could run
individually on hosts or in sets on hosts, whichever makes sense based on
the resources and redundancy needed.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an even
number of shards are allocated across nodes, regardless of which index
they
belong to.
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.
Mark
On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:
@Yooz: cool!, can you share the code? People might find it helpful.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.
You could scale up or down for each index by reducing/increasing the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.
I had the same problem where I had a tiny index causing the shards of a
large index to be balanced oddly.
I ended up setting the small indexe's replicas = shards -1, which will
force the larger index to balance
as if there were no other indexes
On Mon, May 21, 2012 at 10:28 AM, MagmaRules mfcoxo@gmail.com wrote:
Hi there,
@Yooz: Can you share some basic steps on how you implemented your
allocator?
I'm currently facing this problem. I have an empty index that ES is giving
the same relevance as another index that has 70GB of size.
On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:
@Yooz: cool!, can you share the code? People might find it helpful.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.
I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.
Mark
On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:
@Yooz: cool!, can you share the code? People might find it helpful.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.
I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.
I had the same problem where I had a tiny index causing the shards of a
large index to be balanced oddly.
I ended up setting the small indexe's replicas = shards -1, which will
force the larger index to balance
as if there were no other indexes
On Mon, May 21, 2012 at 10:28 AM, MagmaRules mfcoxo@gmail.com wrote:
Hi there,
@Yooz: Can you share some basic steps on how you implemented your
allocator?
I'm currently facing this problem. I have an empty index that ES is
giving the same relevance as another index that has 70GB of size.
On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:
@Yooz: cool!, can you share the code? People might find it helpful.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.
I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.
Mark
On Monday, January 23, 2012 8:51:02 PM UTC, kimchy wrote:
@Yooz: cool!, can you share the code? People might find it helpful.
You can always create your own allocation module under:
modules/elasticsearch/src/main/java/org/elasticsearch/
cluster/routing/
allocation/allocator/
We are running a custom module that balances approximately within an
index. I think in general though, there is a plan to make more
resource aware allocation schemes, i.e. not only index size, but also
cpu/memory/disk constraints across non-uniform hardware, which is a
superset of this issue.
Heya, just to add: Yes, currently, elasticsearch will make sure an
even
number of shards are allocated across nodes, regardless of which
index they
belong to.
I have only been working with ES for a few weeks now, so take my
advice
with a grain of salt.
A possible mitigation might be to have each node in the cluster
bound to a
specific index using the index shard allocation described here Elasticsearch Platform — Find real-time answers at scale | Elastic**
allocation..http://www.elasticsearch.org/guide/reference/index-modules/allocation..
..
You could scale up or down for each index by reducing/increasing
the
resources per node, or increasing the number of nodes. The nodes
could run
individually on hosts or in sets on hosts, whichever makes sense
based on
the resources and redundancy needed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.