Rebalancing of shards

Felipe_Hummel · August 22, 2013, 7:49pm

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is important
for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of the
queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · August 22, 2013, 8:33pm

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Felipe_Hummel · August 22, 2013, 8:56pm

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it was
using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · August 23, 2013, 6:17am

Can you provide the total number of shards you have and the average number
of shards per node?
here are some other things I want to know...

do you have any custom settings related to allocation on your cluster?
ie. do you restrict the # of shards per node or something like this?
can you past the output of curl -XGET localhost:9200/_cluster/settings
did you upgrade from 0.20 without reindexing? if that is the case I think
the cluster will not start moving things around like crazy to prevent you
from going down. Yet you can raise:
cluster.routing.allocation.balance.index
to something higher than 0.50 to make sure the index balance is more
important to you that might start some relocations but it's not guaranteed
which index is rebalanced if you have so many of them. Yet, over time you
should get to a point where you are more balanced in terms of indices. (see
also Elasticsearch Platform — Find real-time answers at scale | Elastic
just make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Felipe_Hummel · August 23, 2013, 2:31pm

Total number of shards is 672 in 65 indices, 8 nodes, (today) replica set
to 1 (before my first e-mail it was 2), total size 80GB.
No custom settings related to allocation
_cluster/settings:
{
"persistent" : { },
"transient" : { }
}
No, we created a new cluster to upgrade to 0.90.*
We're gonna try increasing the cluster.routing.allocation.balance.index
I'm attaching 4 screens from HEAD plugin. I guess that's the best way to
see it. Although, replica was set to 2, yesterday we set replica=0 (for
some tests) and then replica=1, which the screens show. The worst balanced
index we manually fixed so it won't show up on the screens.
The only "hole" I see, is in file 1.png. The third index (left to right)
does not use one of the nodes. Maybe the replica=0 then replica=1 fixed a
bit of the other unbalanced indices.

On a (almost) side question, this "balance problem" was found because we
were investigating a strange behavior: if we executed the same query
repeatedly (dozens of times) the majority of times the time is low, but
some times the latency spike to 2x-10x the original time (latency being the
"took" time returned by ES). We've tracked the queries, and it is probably
not related to concurrency with other queries (the system was almost idle
when the experiment was made), although forcing other queries at the same
time seem to make the problem worst.
A sample of latencies running the same query one after the other (no time
in between), you can see the variance of latency:

17 ms
17 ms
10 ms
19 ms
44 ms
124 ms
11 ms
11 ms
182 ms
17 ms
11 ms
169 ms
13 ms
11 ms
12 ms
69 ms
42 ms
21 ms
12 ms
11 ms

We were following everything through the bigdesk plugin. Sometimes the
Search Threadpool spikes the queue and count metrics. While running 6
streams of queries (one distinct query per stream) in parallel, we've seen
the "queue" metric spike up to 45.
In our setup, a simple query can search in *at least *12 indices (the last
12 months), that is around 12 primary shards (we only search *_primary_first
*).
We run ES on 8 EC2's m1.large with 2-cores. So the Search Threadpool has
fixed-sizehttp://www.elasticsearch.org/guide/reference/modules/threadpool/to 6.

Does that mean that if a query spanning 12 shards in one node would consume
the 6 threads on the pool right away, and the other 6 "tasks" for the
remaining shards would wait in the queue?

We've been monitoring the CPU in the 8 nodes. It rarely goes higher than
40% (generally on larger/full GCs). The instance has 7GB, the JVM is set to
use 3GB (the memory usage normally goes from ~1GB, and up up until ~2.2GB
than a Full GC back to ~1GB).

Maybe we could increase the number of threads in the search pool as we have
so many indices/shards for each query?

Any thoughts?
Thanks!

Felipe Hummel

On Friday, August 23, 2013 2:17:31 AM UTC-4, simonw wrote:

Can you provide the total number of shards you have and the average number
of shards per node?
here are some other things I want to know...

do you have any custom settings related to allocation on your cluster?
ie. do you restrict the # of shards per node or something like this?

can you past the output of curl -XGET localhost:9200/_cluster/settings

did you upgrade from 0.20 without reindexing? if that is the case I
think the cluster will not start moving things around like crazy to prevent
you from going down. Yet you can raise:
cluster.routing.allocation.balance.index
to something higher than 0.50 to make sure the index balance is more
important to you that might start some relocations but it's not guaranteed
which index is rebalanced if you have so many of them. Yet, over time you
should get to a point where you are more balanced in terms of indices. (see
also
Elasticsearch Platform — Find real-time answers at scale | Elastic make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009). Some
months are very unbalanced. For example in the current month index only 4
nodes are used instead of 8; and 2 primary shards are in the same node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage of
the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · August 23, 2013, 4:46pm

ok you have a lot of info / questions in that post lemme try to answer
inline

On Friday, August 23, 2013 4:31:11 PM UTC+2, Felipe Hummel wrote:

Total number of shards is 672 in 65 indices, 8 nodes, (today) replica
set to 1 (before my first e-mail it was 2), total size 80GB.

No custom settings related to allocation

_cluster/settings:
{
"persistent" : { },
"transient" : { }
}

No, we created a new cluster to upgrade to 0.90.*

We're gonna try increasing the cluster.routing.allocation.balance.index

given the screenshots you showed me this might not necessarily help but you
can try though. in your setup I'd rather think about reducing the number of
shards such that a single request hits a minority of the nodes you have.
Currently you have 5 shards and 8 nodes I would rather think of either 2 or
3 shards to make sure that you only hit a some of the nodes. I am not sure
how big your indices are but it seems they are rather smallish? This might
also reduce memory consumption. This way you might get a more stable
cluster and response times since two requests might not compete for
resources.

I'm attaching 4 screens from HEAD plugin. I guess that's the best way to
see it. Although, replica was set to 2, yesterday we set replica=0 (for
some tests) and then replica=1, which the screens show. The worst balanced
index we manually fixed so it won't show up on the screens.
The only "hole" I see, is in file 1.png. The third index (left to right)
does not use one of the nodes. Maybe the replica=0 then replica=1 fixed a
bit of the other unbalanced indices.

with that many indices I think there is almost no way to prevent what you
are seeing. The algorithm can only balance to a next state where you
improve your balance and at some point you can't move forward without major
re-shuffeling which we don't do to make sure your cluster doesn't break.

On a (almost) side question, this "balance problem" was found because we
were investigating a strange behavior: if we executed the same query
repeatedly (dozens of times) the majority of times the time is low, but
some times the latency spike to 2x-10x the original time (latency being the
"took" time returned by ES). We've tracked the queries, and it is probably
not related to concurrency with other queries (the system was almost idle
when the experiment was made), although forcing other queries at the same
time seem to make the problem worst.
A sample of latencies running the same query one after the other (no time
in between), you can see the variance of latency:

17 ms
17 ms
10 ms
19 ms
44 ms
124 ms
11 ms
11 ms
182 ms
17 ms
11 ms
169 ms
13 ms
11 ms
12 ms
69 ms
42 ms
21 ms
12 ms
11 ms

We were following everything through the bigdesk plugin. Sometimes the
Search Threadpool spikes the queue and count metrics. While running 6
streams of queries (one distinct query per stream) in parallel, we've seen
the "queue" metric spike up to 45.
In our setup, a simple query can search in *at least 12 indices (the
last 12 months), that is around 12 primary shards (we only search *
_primary_first).
We run ES on 8 EC2's m1.large with 2-cores. So the Search Threadpool has
fixed-sizehttp://www.elasticsearch.org/guide/reference/modules/threadpool/to 6.

I think by reducing the # of shards you will also somehow fix that problem
since you will have better load balancing etc. Lemme ask why do you do
_primary_first searches? any reason for this?

Does that mean that if a query spanning 12 shards in one node would
consume the 6 threads on the pool right away, and the other 6 "tasks" for
the remaining shards would wait in the queue?

YES

We've been monitoring the CPU in the 8 nodes. It rarely goes higher than
40% (generally on larger/full GCs). The instance has 7GB, the JVM is set to
use 3GB (the memory usage normally goes from ~1GB, and up up until ~2.2GB
than a Full GC back to ~1GB).

Maybe we could increase the number of threads in the search pool as we have

so many indices/shards for each query?

yeah I personally would reduce the # of shards first...

simon

Any thoughts?
Thanks!

Felipe Hummel

On Friday, August 23, 2013 2:17:31 AM UTC-4, simonw wrote:

Can you provide the total number of shards you have and the average
number of shards per node?
here are some other things I want to know...

do you have any custom settings related to allocation on your cluster?
ie. do you restrict the # of shards per node or something like this?

can you past the output of curl -XGET localhost:9200/_cluster/settings

did you upgrade from 0.20 without reindexing? if that is the case I
think the cluster will not start moving things around like crazy to prevent
you from going down. Yet you can raise:
cluster.routing.allocation.balance.index
to something higher than 0.50 to make sure the index balance is more
important to you that might start some relocations but it's not guaranteed
which index is rebalanced if you have so many of them. Yet, over time you
should get to a point where you are more balanced in terms of indices. (see
also
Elasticsearch Platform — Find real-time answers at scale | Elastic make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009).
Some months are very unbalanced. For example in the current month index
only 4 nodes are used instead of 8; and 2 primary shards are in the same
node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage
of the queries touch all the indices.

Is there any way to ease this situation that does not involve manually
moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Felipe_Hummel · August 23, 2013, 5:53pm

Our complete dataset is about ~60M documents (growing ~2.5M/month). Each
monthly index ranges from 500K documents (older ones) to 2M (currently). A
good percentage of the queries searches all the 65 months/indices, 100% of
them search the last 12 months/indices. With 2M documents indices, we
thought 5 shards => 1 shard ~= 400K documents would be a good setup.

We use _primary_first because when we first added replicas we saw an
increase in query latency. We attributed the increase to the fact that
replicas were "stealing" memory from the master shards by sharing the
available OS cache
As our number of queries/second is very low we don't need replicas for load
balancing, just for failover. In other words, it is an effort to make all
shards stay on OS cache as much as we can. Some queries take > 1 second
when cold, and then goes down to < 200 ms when in cache.

We're going to study how (much) to decrease the # of shards and (maybe)
increase the search threadpool.

Thanks for the thoughtful answers!

Felipe Hummel

On Friday, August 23, 2013 12:46:55 PM UTC-4, simonw wrote:

ok you have a lot of info / questions in that post lemme try to answer
inline

On Friday, August 23, 2013 4:31:11 PM UTC+2, Felipe Hummel wrote:

Total number of shards is 672 in 65 indices, 8 nodes, (today) replica
set to 1 (before my first e-mail it was 2), total size 80GB.

No custom settings related to allocation

_cluster/settings:
{
"persistent" : { },
"transient" : { }
}

No, we created a new cluster to upgrade to 0.90.*

We're gonna try increasing the cluster.routing.allocation.balance.index

given the screenshots you showed me this might not necessarily help but
you can try though. in your setup I'd rather think about reducing the
number of shards such that a single request hits a minority of the nodes
you have. Currently you have 5 shards and 8 nodes I would rather think of
either 2 or 3 shards to make sure that you only hit a some of the nodes. I
am not sure how big your indices are but it seems they are rather smallish?
This might also reduce memory consumption. This way you might get a more
stable cluster and response times since two requests might not compete for
resources.

I'm attaching 4 screens from HEAD plugin. I guess that's the best way
to see it. Although, replica was set to 2, yesterday we set replica=0 (for
some tests) and then replica=1, which the screens show. The worst balanced
index we manually fixed so it won't show up on the screens.
The only "hole" I see, is in file 1.png. The third index (left to right)
does not use one of the nodes. Maybe the replica=0 then replica=1 fixed a
bit of the other unbalanced indices.

with that many indices I think there is almost no way to prevent what you
are seeing. The algorithm can only balance to a next state where you
improve your balance and at some point you can't move forward without major
re-shuffeling which we don't do to make sure your cluster doesn't break.

On a (almost) side question, this "balance problem" was found because we
were investigating a strange behavior: if we executed the same query
repeatedly (dozens of times) the majority of times the time is low, but
some times the latency spike to 2x-10x the original time (latency being the
"took" time returned by ES). We've tracked the queries, and it is probably
not related to concurrency with other queries (the system was almost idle
when the experiment was made), although forcing other queries at the same
time seem to make the problem worst.
A sample of latencies running the same query one after the other (no time
in between), you can see the variance of latency:

17 ms
17 ms
10 ms
19 ms
44 ms
124 ms
11 ms
11 ms
182 ms
17 ms
11 ms
169 ms
13 ms
11 ms
12 ms
69 ms
42 ms
21 ms
12 ms
11 ms

We were following everything through the bigdesk plugin. Sometimes the
Search Threadpool spikes the queue and count metrics. While running 6
streams of queries (one distinct query per stream) in parallel, we've seen
the "queue" metric spike up to 45.
In our setup, a simple query can search in *at least 12 indices (the
last 12 months), that is around 12 primary shards (we only search *
_primary_first).
We run ES on 8 EC2's m1.large with 2-cores. So the Search Threadpool has
fixed-sizehttp://www.elasticsearch.org/guide/reference/modules/threadpool/to 6.

I think by reducing the # of shards you will also somehow fix that problem
since you will have better load balancing etc. Lemme ask why do you do
_primary_first searches? any reason for this?

Does that mean that if a query spanning 12 shards in one node would
consume the 6 threads on the pool right away, and the other 6 "tasks" for
the remaining shards would wait in the queue?

YES

We've been monitoring the CPU in the 8 nodes. It rarely goes higher than
40% (generally on larger/full GCs). The instance has 7GB, the JVM is set to
use 3GB (the memory usage normally goes from ~1GB, and up up until ~2.2GB
than a Full GC back to ~1GB).

Maybe we could increase the number of threads in the search pool as we

have so many indices/shards for each query?

yeah I personally would reduce the # of shards first...

simon

Any thoughts?
Thanks!

Felipe Hummel

On Friday, August 23, 2013 2:17:31 AM UTC-4, simonw wrote:

Can you provide the total number of shards you have and the average
number of shards per node?
here are some other things I want to know...

do you have any custom settings related to allocation on your cluster?
ie. do you restrict the # of shards per node or something like this?

can you past the output of curl -XGET localhost:9200/_cluster/settings

did you upgrade from 0.20 without reindexing? if that is the case I
think the cluster will not start moving things around like crazy to prevent
you from going down. Yet you can raise:
cluster.routing.allocation.balance.index
to something higher than 0.50 to make sure the index balance is more
important to you that might start some relocations but it's not guaranteed
which index is rebalanced if you have so many of them. Yet, over time you
should get to a point where you are more balanced in terms of indices. (see
also
Elasticsearch Platform — Find real-time answers at scale | Elastic make sure you don't lower the threshold )

in general it would be useful to see the layout of your entire cluster
shard wise to tell better what is going on. Can you provide this infos?

simon

On Thursday, August 22, 2013 10:56:59 PM UTC+2, Felipe Hummel wrote:

Sorry about missing that information, we're on 0.90.2

I attached a view from the HEAD plugin, showing a single index where it
was using just 4 nodes instead of all 8.

On Thursday, August 22, 2013 4:33:31 PM UTC-4, simonw wrote:

which version of ES are you using? 0.90 does that by default though...

simon

On Thursday, August 22, 2013 9:49:35 PM UTC+2, Felipe Hummel wrote:

Is there a way to force ES to balance shards considering the indices
individually instead of all the shards in the cluster?

I have an index per month scheme (that goes all the way to ~2009).
Some months are very unbalanced. For example in the current month index
only 4 nodes are used instead of 8; and 2 primary shards are in the same
node.
I use preference=_primary_first, so primary shard allocation is
important for me (we're using replicas just for failover).

The last months are more used than older ones, but a good percentage
of the queries touch all the indices.

Is there any way to ease this situation that does not involve
manually moving shards around?

Thanks!

Felipe Hummel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Need help with shard balancing Elasticsearch	4	351	July 6, 2017
Manuall re-balancing shard Elasticsearch	13	441	July 6, 2017
Shard balancing question Elasticsearch	1	292	July 6, 2017
Rebalance primary shards Elasticsearch	4	3108	July 6, 2017
Elasticsearch 2.4 Shard Rebalancing Elasticsearch	7	906	May 27, 2019

Rebalancing of shards

Related topics