I am having a similar experience. In my case, 1 node is pegged while
others have only 25-50% CPU utilization. That over-burdened node is
hosting 4 index shards while the others have 3; it is not the master.
None of the nodes are swapping.
Running v0.18.7. I have only a single index: 5 shards, 1 replica. ES
Head reports it as ~1mil documents and ~1.8g in size. The index is
configured to be entirely in memory with the S3 gateway. ES_MAX_MEM is
at 4g. I'm running it on a 3-node cluster of m1.large instances.
I started with a bulk-insert of all the data. After that, a couple
schema tweaks resulted in additional loads of a portion of he data
set. The querying being run is "match_all" with a "geo_distance"
filter applied. At this point no other fields are being filtered. No
writes are happening durring this read test.
Why is this single machine the bottle-neck? How can I go about
distributing this load more evenly? Is there a minimum cluster size or
specific cluster configuration I should be using for this kind of
load? I'm curious how I can even go about diagnosing the issue.
On Feb 27, 5:09 am, Shay Banon kim...@gmail.com wrote:
As Berkay mentioned, the balancing theme is across indices, though I have started to work on smarter one that will do better cross indices case. In 0.19, I added a bandaid (for now) where you can force, on the index level, max number of shards per node, so you can force even balancing (take total_number_of_shards / number_of_nodes, where total_number_of_shards is (1+number_of_replicas) * number_of_shards).
On Monday, February 27, 2012 at 5:12 AM, Berkay Mollamustafaoglu wrote:
ES does not distribute shards to accomplish " even number of shards per index". It distributes to have even number of shards per node in total. Currently in ES, when you have indices with drastically different sizes, you end up with unbalanced load.
Why do you have a separate index with 20 shards for the index with 106 documents, is it supposed to grow? The easiest way to resolve the problem is to have only a single index with different types for 2nd and 3rd index, rather than having separate indices.
mberkay on yahoo, google and skype
On Sun, Feb 26, 2012 at 7:36 PM, Chris Rode <cir...@gmail.com (mailto:cir...@gmail.com)> wrote:
This new skitch illustrates what I was saying about the index-node
On Feb 27, 10:25 am, Chris Rode <cir...@gmail.com (mailto:cir...@gmail.com)> wrote:
The image attached via skitch shows the cluster and distribution at
the time of the Jstack trace
We have 4 nodes, with 3 indexes and a replication factor of 3. Each
index has 20 shards. These are running across two availability zones
When we turn down the replication factor the nodes get an even number
of shards, however they do not get an even number of shards per index.
This means that indexes with higher load do not utilise the entire
We are currently using 0.18.7
On Feb 27, 6:22 am, Shay Banon <kim...@gmail.com (mailto:kim...@gmail.com)> wrote:
The setting only applies in 0.19, but you did not answer the question with a bit more detailed information (how many indices, how many shards, are they not evenly balanced)?
On Wednesday, February 22, 2012 at 1:32 AM,ChrisRodewrote:
UPDATE: Unfortunately applying this transiently baulked.. however I
have applied manually to each node in the cluster. After the dust
settled, the shards still aren't distributed fairly.
We have set the setting to 13 (20(shards)*4(original+copies)/7(nodes)
= 12 so this should be fine) and some nodes still have 20 shards on an
We are using 18.6 is this an 18.7 only setting by any chance?
On Feb 21, 5:16 pm,ChrisRode<cir...@gmail.com (mailto:cir...@gmail.com) (http://gmail.com)> wrote:
Yes we do, we have 3, each with 20 shards. Looking with elasticsearch-
head I can see that each index is not spread evenly. Trying to use
index.routing.allocation.total_shards_per_node (set to 15) transiently
Anything else I should try?
On Feb 21, 12:12 am, Shay Banon <kim...@gmail.com (mailto:kim...@gmail.com) (http://gmail.com)> wrote:
Do you have multiple indices in the cluster? If so, maybe you end up with certain indices that only on a subset of the cluster?
On Monday, February 20, 2012 at 6:01 AM,ChrisRodewrote:
I have not been able to find anything to explain why this may be occurring..
We have a cluster of 7 nodes on ephemeral EC2 with an S3 gateway set to 10 minutes. When we put the cluster under load though, only 3 of the 7 nodes get any load on them, and that quickly builds up to a point where queries are timing out.
We originally had a replication factor of 3 set, and have just updated it to 6 (shard count is 20 per index), however this does not seem to have done the trick