Rebalancing of shards

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is important
for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of the
queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it was
using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can you provide the total number of shards you have and the average number
of shards per node?
here are some other things I want to know...

  • do you have any custom settings related to allocation on your cluster?
    ie. do you restrict the # of shards per node or something like this?
  • can you past the output of curl -XGET localhost:9200/_cluster/settings
  • did you upgrade from 0.20 without reindexing? if that is the case I think
    the cluster will not start moving things around like crazy to prevent you
    from going down. Yet you can raise:
    cluster.routing.allocation.balance.index
    to something higher than 0.50 to make sure the index balance is more
    important to you that might start some relocations but it's not guaranteed
    which index is rebalanced if you have so many of them. Yet, over time you
    should get to a point where you are more balanced in terms of indices. (see
    also Elasticsearch Platform — Find real-time answers at scale | Elastic
    just make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  • Total number of shards is 672 in 65 indices, 8 nodes, (today) replica set
    to 1 (before my first e-mail it was 2), total size 80GB.

  • No custom settings related to allocation

  • _cluster/settings:
    {
    "persistent" : { },
    "transient" : { }
    }

  • No, we created a new cluster to upgrade to 0.90.*

  • We're gonna try increasing the cluster.routing.allocation.balance.index

  • I'm attaching 4 screens from HEAD plugin. I guess that's the best way to
    see it. Although, replica was set to 2, yesterday we set replica=0 (for
    some tests) and then replica=1, which the screens show. The worst balanced
    index we manually fixed so it won't show up on the screens.
    The only "hole" I see, is in file 1.png. The third index (left to right)
    does not use one of the nodes. Maybe the replica=0 then replica=1 fixed a
    bit of the other unbalanced indices.

On a (almost) side question, this "balance problem" was found because we
were investigating a strange behavior: if we executed the same query
repeatedly (dozens of times) the majority of times the time is low, but
some times the latency spike to 2x-10x the original time (latency being the
"took" time returned by ES). We've tracked the queries, and it is probably
not related to concurrency with other queries (the system was almost idle
when the experiment was made), although forcing other queries at the same
time seem to make the problem worst.
A sample of latencies running the same query one after the other (no time
in between), you can see the variance of latency:

17 ms
17 ms
10 ms
19 ms
44 ms
124 ms
11 ms
11 ms
182 ms
17 ms
11 ms
169 ms
13 ms
11 ms
12 ms
69 ms
42 ms
21 ms
12 ms
11 ms

We were following everything through the bigdesk plugin. Sometimes the
Search Threadpool spikes the queue and count metrics. While running 6
streams of queries (one distinct query per stream) in parallel, we've seen
the "queue" metric spike up to 45.
In our setup, a simple query can search in *at least *12 indices (the last
12 months), that is around 12 primary shards (we only search *_primary_first
*).
We run ES on 8 EC2's m1.large with 2-cores. So the Search Threadpool has
fixed-sizehttp://www.elasticsearch.org/guide/reference/modules/threadpool/to 6.

Does that mean that if a query spanning 12 shards in one node would consume
the 6 threads on the pool right away, and the other 6 "tasks" for the
remaining shards would wait in the queue?

We've been monitoring the CPU in the 8 nodes. It rarely goes higher than
40% (generally on larger/full GCs). The instance has 7GB, the JVM is set to
use 3GB (the memory usage normally goes from ~1GB, and up up until ~2.2GB
than a Full GC back to ~1GB).

Maybe we could increase the number of threads in the search pool as we have
so many indices/shards for each query?

Any thoughts?
Thanks!

Felipe Hummel

On Friday, August 23, 2013 2:17:31 AM UTC-4, simonw wrote:

Can you provide the total number of shards you have and the average number
of shards per node?
here are some other things I want to know...

  • do you have any custom settings related to allocation on your cluster?
    ie. do you restrict the # of shards per node or something like this?
  • can you past the output of curl -XGET localhost:9200/_cluster/settings
  • did you upgrade from 0.20 without reindexing? if that is the case I
    think the cluster will not start moving things around like crazy to prevent
    you from going down. Yet you can raise:
    cluster.routing.allocation.balance.index
    to something higher than 0.50 to make sure the index balance is more
    important to you that might start some relocations but it's not guaranteed
    which index is rebalanced if you have so many of them. Yet, over time you
    should get to a point where you are more balanced in terms of indices. (see
    also
    Elasticsearch Platform — Find real-time answers at scale | Elastic make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ok you have a lot of info / questions in that post lemme try to answer
inline

On Friday, August 23, 2013 4:31:11 PM UTC+2, Felipe Hummel wrote:

  • Total number of shards is 672 in 65 indices, 8 nodes, (today) replica
    set to 1 (before my first e-mail it was 2), total size 80GB.

  • No custom settings related to allocation

  • _cluster/settings:
    {
    "persistent" : { },
    "transient" : { }
    }

  • No, we created a new cluster to upgrade to 0.90.*

  • We're gonna try increasing the cluster.routing.allocation.balance.index

given the screenshots you showed me this might not necessarily help but you
can try though. in your setup I'd rather think about reducing the number of
shards such that a single request hits a minority of the nodes you have.
Currently you have 5 shards and 8 nodes I would rather think of either 2 or
3 shards to make sure that you only hit a some of the nodes. I am not sure
how big your indices are but it seems they are rather smallish? This might
also reduce memory consumption. This way you might get a more stable
cluster and response times since two requests might not compete for
resources.

  • I'm attaching 4 screens from HEAD plugin. I guess that's the best way to
    see it. Although, replica was set to 2, yesterday we set replica=0 (for
    some tests) and then replica=1, which the screens show. The worst balanced
    index we manually fixed so it won't show up on the screens.
    The only "hole" I see, is in file 1.png. The third index (left to right)
    does not use one of the nodes. Maybe the replica=0 then replica=1 fixed a
    bit of the other unbalanced indices.

with that many indices I think there is almost no way to prevent what you
are seeing. The algorithm can only balance to a next state where you
improve your balance and at some point you can't move forward without major
re-shuffeling which we don't do to make sure your cluster doesn't break.

On a (almost) side question, this "balance problem" was found because we
were investigating a strange behavior: if we executed the same query
repeatedly (dozens of times) the majority of times the time is low, but
some times the latency spike to 2x-10x the original time (latency being the
"took" time returned by ES). We've tracked the queries, and it is probably
not related to concurrency with other queries (the system was almost idle
when the experiment was made), although forcing other queries at the same
time seem to make the problem worst.
A sample of latencies running the same query one after the other (no time
in between), you can see the variance of latency:

17 ms
17 ms
10 ms
19 ms
44 ms
124 ms
11 ms
11 ms
182 ms
17 ms
11 ms
169 ms
13 ms
11 ms
12 ms
69 ms
42 ms
21 ms
12 ms
11 ms

We were following everything through the bigdesk plugin. Sometimes the
Search Threadpool spikes the queue and count metrics. While running 6
streams of queries (one distinct query per stream) in parallel, we've seen
the "queue" metric spike up to 45.
In our setup, a simple query can search in *at least 12 indices (the
last 12 months), that is around 12 primary shards (we only search *
_primary_first
).
We run ES on 8 EC2's m1.large with 2-cores. So the Search Threadpool has
fixed-sizehttp://www.elasticsearch.org/guide/reference/modules/threadpool/to 6.

I think by reducing the # of shards you will also somehow fix that problem
since you will have better load balancing etc. Lemme ask why do you do
_primary_first searches? any reason for this?

Does that mean that if a query spanning 12 shards in one node would
consume the 6 threads on the pool right away, and the other 6 "tasks" for
the remaining shards would wait in the queue?

YES

We've been monitoring the CPU in the 8 nodes. It rarely goes higher than
40% (generally on larger/full GCs). The instance has 7GB, the JVM is set to
use 3GB (the memory usage normally goes from ~1GB, and up up until ~2.2GB
than a Full GC back to ~1GB).

Maybe we could increase the number of threads in the search pool as we have

so many indices/shards for each query?

yeah I personally would reduce the # of shards first...

simon

Any thoughts?
Thanks!

Felipe Hummel

On Friday, August 23, 2013 2:17:31 AM UTC-4, simonw wrote:

Can you provide the total number of shards you have and the average
number of shards per node?
here are some other things I want to know...

  • do you have any custom settings related to allocation on your cluster?
    ie. do you restrict the # of shards per node or something like this?
  • can you past the output of curl -XGET localhost:9200/_cluster/settings
  • did you upgrade from 0.20 without reindexing? if that is the case I
    think the cluster will not start moving things around like crazy to prevent
    you from going down. Yet you can raise:
    cluster.routing.allocation.balance.index
    to something higher than 0.50 to make sure the index balance is more
    important to you that might start some relocations but it's not guaranteed
    which index is rebalanced if you have so many of them. Yet, over time you
    should get to a point where you are more balanced in terms of indices. (see
    also
    Elasticsearch Platform — Find real-time answers at scale | Elastic make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009).
Some months are very unbalanced. For example in the current month index
only 4 nodes are used instead of 8; and 2 primary shards are in the same
node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage
of the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Our complete dataset is about ~60M documents (growing ~2.5M/month). Each
monthly index ranges from 500K documents (older ones) to 2M (currently). A
good percentage of the queries searches all the 65 months/indices, 100% of
them search the last 12 months/indices. With 2M documents indices, we
thought 5 shards => 1 shard ~= 400K documents would be a good setup.

We use _primary_first because when we first added replicas we saw an
increase in query latency. We attributed the increase to the fact that
replicas were "stealing" memory from the master shards by sharing the
available OS cache
As our number of queries/second is very low we don't need replicas for load
balancing, just for failover. In other words, it is an effort to make all
shards stay on OS cache as much as we can. Some queries take > 1 second
when cold, and then goes down to < 200 ms when in cache.

We're going to study how (much) to decrease the # of shards and (maybe)
increase the search threadpool.

Thanks for the thoughtful answers!

Felipe Hummel

On Friday, August 23, 2013 12:46:55 PM UTC-4, simonw wrote:

ok you have a lot of info / questions in that post lemme try to answer
inline

On Friday, August 23, 2013 4:31:11 PM UTC+2, Felipe Hummel wrote:

  • Total number of shards is 672 in 65 indices, 8 nodes, (today) replica
    set to 1 (before my first e-mail it was 2), total size 80GB.

  • No custom settings related to allocation

  • _cluster/settings:
    {
    "persistent" : { },
    "transient" : { }
    }

  • No, we created a new cluster to upgrade to 0.90.*

  • We're gonna try increasing the cluster.routing.allocation.balance.index

given the screenshots you showed me this might not necessarily help but
you can try though. in your setup I'd rather think about reducing the
number of shards such that a single request hits a minority of the nodes
you have. Currently you have 5 shards and 8 nodes I would rather think of
either 2 or 3 shards to make sure that you only hit a some of the nodes. I
am not sure how big your indices are but it seems they are rather smallish?
This might also reduce memory consumption. This way you might get a more
stable cluster and response times since two requests might not compete for
resources.

  • I'm attaching 4 screens from HEAD plugin. I guess that's the best way
    to see it. Although, replica was set to 2, yesterday we set replica=0 (for
    some tests) and then replica=1, which the screens show. The worst balanced
    index we manually fixed so it won't show up on the screens.
    The only "hole" I see, is in file 1.png. The third index (left to right)
    does not use one of the nodes. Maybe the replica=0 then replica=1 fixed a
    bit of the other unbalanced indices.

with that many indices I think there is almost no way to prevent what you
are seeing. The algorithm can only balance to a next state where you
improve your balance and at some point you can't move forward without major
re-shuffeling which we don't do to make sure your cluster doesn't break.

On a (almost) side question, this "balance problem" was found because we
were investigating a strange behavior: if we executed the same query
repeatedly (dozens of times) the majority of times the time is low, but
some times the latency spike to 2x-10x the original time (latency being the
"took" time returned by ES). We've tracked the queries, and it is probably
not related to concurrency with other queries (the system was almost idle
when the experiment was made), although forcing other queries at the same
time seem to make the problem worst.
A sample of latencies running the same query one after the other (no time
in between), you can see the variance of latency:

17 ms
17 ms
10 ms
19 ms
44 ms
124 ms
11 ms
11 ms
182 ms
17 ms
11 ms
169 ms
13 ms
11 ms
12 ms
69 ms
42 ms
21 ms
12 ms
11 ms

We were following everything through the bigdesk plugin. Sometimes the
Search Threadpool spikes the queue and count metrics. While running 6
streams of queries (one distinct query per stream) in parallel, we've seen
the "queue" metric spike up to 45.
In our setup, a simple query can search in *at least 12 indices (the
last 12 months), that is around 12 primary shards (we only search *
_primary_first
).
We run ES on 8 EC2's m1.large with 2-cores. So the Search Threadpool has
fixed-sizehttp://www.elasticsearch.org/guide/reference/modules/threadpool/to 6.

I think by reducing the # of shards you will also somehow fix that problem
since you will have better load balancing etc. Lemme ask why do you do
_primary_first searches? any reason for this?

Does that mean that if a query spanning 12 shards in one node would
consume the 6 threads on the pool right away, and the other 6 "tasks" for
the remaining shards would wait in the queue?

YES

We've been monitoring the CPU in the 8 nodes. It rarely goes higher than
40% (generally on larger/full GCs). The instance has 7GB, the JVM is set to
use 3GB (the memory usage normally goes from ~1GB, and up up until ~2.2GB
than a Full GC back to ~1GB).

Maybe we could increase the number of threads in the search pool as we

have so many indices/shards for each query?

yeah I personally would reduce the # of shards first...

simon

Any thoughts?
Thanks!

Felipe Hummel

On Friday, August 23, 2013 2:17:31 AM UTC-4, simonw wrote:

Can you provide the total number of shards you have and the average
number of shards per node?
here are some other things I want to know...

  • do you have any custom settings related to allocation on your cluster?
    ie. do you restrict the # of shards per node or something like this?
  • can you past the output of curl -XGET localhost:9200/_cluster/settings
  • did you upgrade from 0.20 without reindexing? if that is the case I
    think the cluster will not start moving things around like crazy to prevent
    you from going down. Yet you can raise:
    cluster.routing.allocation.balance.index
    to something higher than 0.50 to make sure the index balance is more
    important to you that might start some relocations but it's not guaranteed
    which index is rebalanced if you have so many of them. Yet, over time you
    should get to a point where you are more balanced in terms of indices. (see
    also
    Elasticsearch Platform — Find real-time answers at scale | Elastic make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009).
Some months are very unbalanced. For example in the current month index
only 4 nodes are used instead of 8; and 2 primary shards are in the same
node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage
of the queries touch all the indices.

Is there any way to ease this situation that does not involve
manually moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.