Even Shard Distribution?

Hey guys,

We've recently set up a 5 node ES cluster, serving our 6-shards / 1-replica
index (we chose 6 back when we only had 3 nodes). We sometimes find a
highly uneven distribution of shards across the nodes. For example, when we
had 3 nodes, 4/6 of the index lived on 1 node, 2/6 lived on another, and
0/0 lived on the last node. While we do have distributed replicas, we've
noticed a very very high "OS load" on the machine that served 4/6 of the
index. In fact, the OS load was so high, that it brought the VM to
a screeching halt.

My question is: how can we force a more even distribution of the shards?
I've played around with "cluster.routing.allocation.balance.*", but those
had little to no effect. And, from my understanding of the docs,
"cluster.routing.allocation.awareness" is aimed for a more even zone
distribution, rather than shards in the same zone.

Thanks in advance for all your help! Much appreciated!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

For the 0/6 node are you sure you don't have some configuration preventing
shards from allocating there?

We use this:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/d2e2989bbafc7f7f730efacaa652a05bec3ef541/modules%2Felasticsearch%2Ftemplates%2Felasticsearch.yml.erb#L420
but its is designed to more evenly spread shards from every index across
all of our nodes rather than more evenly spread total shards across the
nodes.

When we update these dynamically using the API we found you sometimes have
to jiggle the shard allocator to force it to start thinking again. So use
the manual allocation api to move a small shard from one node to another or
something.

Nik

On Wed, Jul 23, 2014 at 8:48 AM, michael@modernmast.com wrote:

Hey guys,

We've recently set up a 5 node ES cluster, serving our 6-shards /
1-replica index (we chose 6 back when we only had 3 nodes). We sometimes
find a highly uneven distribution of shards across the nodes. For example,
when we had 3 nodes, 4/6 of the index lived on 1 node, 2/6 lived on
another, and 0/0 lived on the last node. While we do have distributed
replicas, we've noticed a very very high "OS load" on the machine that
served 4/6 of the index. In fact, the OS load was so high, that it brought
the VM to a screeching halt.

My question is: how can we force a more even distribution of the shards?
I've played around with "cluster.routing.allocation.balance.*", but those
had little to no effect. And, from my understanding of the docs,
"cluster.routing.allocation.awareness" is aimed for a more even zone
distribution, rather than shards in the same zone.

Thanks in advance for all your help! Much appreciated!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2825Vk0TZOiLFXnd1iT%2BnemwDOsWx7XxGyx8WznAHCUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for that, Nik. I'm okay with evenly spreading all the indices,
rather than just the one I'm having issues with. I'll give your config a
try!

Def no special configurations on that one. We didn't even realize there was
such a thing as allocation configuration up until yesterday (after the 0/6
allocation happened). The 0/6 node did, however, contain 4/6 replicas... If
that makes a difference?

On Wednesday, July 23, 2014 9:03:46 AM UTC-4, Nikolas Everett wrote:

For the 0/6 node are you sure you don't have some configuration preventing
shards from allocating there?

We use this:

elasticsearch.yml.erb · operations-puppet
but its is designed to more evenly spread shards from every index across
all of our nodes rather than more evenly spread total shards across the
nodes.

When we update these dynamically using the API we found you sometimes have
to jiggle the shard allocator to force it to start thinking again. So use
the manual allocation api to move a small shard from one node to another or
something.

Nik

On Wed, Jul 23, 2014 at 8:48 AM, <mic...@modernmast.com <javascript:>>
wrote:

Hey guys,

We've recently set up a 5 node ES cluster, serving our 6-shards /
1-replica index (we chose 6 back when we only had 3 nodes). We sometimes
find a highly uneven distribution of shards across the nodes. For example,
when we had 3 nodes, 4/6 of the index lived on 1 node, 2/6 lived on
another, and 0/0 lived on the last node. While we do have distributed
replicas, we've noticed a very very high "OS load" on the machine that
served 4/6 of the index. In fact, the OS load was so high, that it brought
the VM to a screeching halt.

My question is: how can we force a more even distribution of the shards?
I've played around with "cluster.routing.allocation.balance.*", but those
had little to no effect. And, from my understanding of the docs,
"cluster.routing.allocation.awareness" is aimed for a more even zone
distribution, rather than shards in the same zone.

Thanks in advance for all your help! Much appreciated!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ccf66cf-987c-4b04-9c18-204c28e3ba7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Wed, Jul 23, 2014 at 9:21 AM, michael@modernmast.com wrote:

Thanks for that, Nik. I'm okay with evenly spreading all the indices,
rather than just the one I'm having issues with. I'll give your config a
try!

Def no special configurations on that one. We didn't even realize there
was such a thing as allocation configuration up until yesterday (after the
0/6 allocation happened). The 0/6 node did, however, contain 4/6
replicas... If that makes a difference?

It certainly does. The way the allocation settings are configured now the
cluster thinks of replicas as just about the same as masters. For the most
part they are, too. For us they certainly are.

If you are sure that being a shard master is significantly more work then
being a shard replica then raise the "primary" setting to something like .5
and lower everything else to total 1.

That whole totally 1 thing isn't required but it makes the numbers easier
to think about.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3%3DvvegS%3D7Hvv_KxOAMCn8rQ5fsJES__Lkg6wn8T4HaVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Got it. To be honest, I was pretty sure of that, up until this AM, when
that same OS Load spike happened again. But this time, the shards were
allocated more evenly. So I'm not sure that's even the problem any more. I
just posted a new post with more information about the load spike issue.
Would you mind taking a look?

Thanks for all your help, Nik.

On Wednesday, July 23, 2014 9:30:00 AM UTC-4, Nikolas Everett wrote:

On Wed, Jul 23, 2014 at 9:21 AM, <mic...@modernmast.com <javascript:>>
wrote:

Thanks for that, Nik. I'm okay with evenly spreading all the indices,
rather than just the one I'm having issues with. I'll give your config a
try!

Def no special configurations on that one. We didn't even realize there
was such a thing as allocation configuration up until yesterday (after the
0/6 allocation happened). The 0/6 node did, however, contain 4/6
replicas... If that makes a difference?

It certainly does. The way the allocation settings are configured now the
cluster thinks of replicas as just about the same as masters. For the most
part they are, too. For us they certainly are.

If you are sure that being a shard master is significantly more work then
being a shard replica then raise the "primary" setting to something like .5
and lower everything else to total 1.

That whole totally 1 thing isn't required but it makes the numbers easier
to think about.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/08002fec-3f29-42a8-8fa9-9701095d099d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Nikolas,

We tried the following:
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"balance": {
"index": "0.05",
"shard": "0.05",
"primary": "0.9",
"threshold": "1.0"
}
}
}
}
},
"transient": {}
}

And it doesn't seem to do much... also when we trigger the movement of
shards manually, it only moves that shard and nothing else. Doesn't cause
a big rebalance. It seems to still balance 2 out of the 5 nodes with
primaries while the other two have only replicas.

Is there anything we're missing?

Thanks!

On Wednesday, July 23, 2014 9:30:00 AM UTC-4, Nikolas Everett wrote:

On Wed, Jul 23, 2014 at 9:21 AM, <mic...@modernmast.com <javascript:>>
wrote:

Thanks for that, Nik. I'm okay with evenly spreading all the indices,
rather than just the one I'm having issues with. I'll give your config a
try!

Def no special configurations on that one. We didn't even realize there
was such a thing as allocation configuration up until yesterday (after the
0/6 allocation happened). The 0/6 node did, however, contain 4/6
replicas... If that makes a difference?

It certainly does. The way the allocation settings are configured now the
cluster thinks of replicas as just about the same as masters. For the most
part they are, too. For us they certainly are.

If you are sure that being a shard master is significantly more work then
being a shard replica then raise the "primary" setting to something like .5
and lower everything else to total 1.

That whole totally 1 thing isn't required but it makes the numbers easier
to think about.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86ef9e5c-01f8-4733-913e-1ac2f851deb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I don't imagine there is anything you are missing. I don't know where to go
from here. Sorry!
On Jul 24, 2014 3:20 PM, "Daniel Schonfeld" downwindabeam@gmail.com wrote:

Hi Nikolas,

We tried the following:
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"balance": {
"index": "0.05",
"shard": "0.05",
"primary": "0.9",
"threshold": "1.0"
}
}
}
}
},
"transient": {}
}

And it doesn't seem to do much... also when we trigger the movement of
shards manually, it only moves that shard and nothing else. Doesn't cause
a big rebalance. It seems to still balance 2 out of the 5 nodes with
primaries while the other two have only replicas.

Is there anything we're missing?

Thanks!

On Wednesday, July 23, 2014 9:30:00 AM UTC-4, Nikolas Everett wrote:

On Wed, Jul 23, 2014 at 9:21 AM, mic...@modernmast.com wrote:

Thanks for that, Nik. I'm okay with evenly spreading all the indices,
rather than just the one I'm having issues with. I'll give your config a
try!

Def no special configurations on that one. We didn't even realize there
was such a thing as allocation configuration up until yesterday (after the
0/6 allocation happened). The 0/6 node did, however, contain 4/6
replicas... If that makes a difference?

It certainly does. The way the allocation settings are configured now
the cluster thinks of replicas as just about the same as masters. For the
most part they are, too. For us they certainly are.

If you are sure that being a shard master is significantly more work then
being a shard replica then raise the "primary" setting to something like .5
and lower everything else to total 1.

That whole totally 1 thing isn't required but it makes the numbers easier
to think about.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/86ef9e5c-01f8-4733-913e-1ac2f851deb6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/86ef9e5c-01f8-4733-913e-1ac2f851deb6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1maeAOdVwT6ExYOOum%3DYt4v4CdYeb1oW0zFs6vsUDL4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.