Cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)

Hi,

I am trying to use cluster.routing.allocation.enable to speed up node
restarts. As I understand it, if I set cluster.routing.allocation.enable to
"none", restart a node, then set cluster.routing.allocation.enable to
"all", the shards that go UNASSIGNED when the node goes down should start
back up on the same node they were assigned to previously. But in practice
when I do this, the shards get assigned across the entire cluster when I
set cluster.routing.allocation.enable back to "all", and then after that,
some amount of rebalancing happens.

How can I avoid this, and make shards on a restarted node come back on the
same node?

To be clear, here's exactly the sequence of events:

  1. curl -XPUT -s $host:$port/_cluster/settings?pretty=1 -d
    '{"persistent":{"cluster.routing.allocation.enable": "none"}}'
  2. service elasticsearch stop on one node of a 3 node cluster
    (discovery.zen.minimum_master_nodes: 2)
  3. shards that were assigned to the now stopped node show as UNASSIGNED
  4. service elasticsearch start on the same node as in (2)
  5. wait a few minutes - shards mentioned in (3) still show as UNASSIGNED,
    each node sees the full cluster (/_cat/nodes)
  6. curl -XPUT -s $host:$port/_cluster/settings?pretty=1 -d
    '{"persistent":{"cluster.routing.allocation.enable": "all"}}'
  7. UNASSIGNED shards mentioned in (3) begin being assigned across all nodes
    in the cluster
  8. After all UNASSIGNED nodes are assigned, some start rebalancing
    (migrating to other nodes)
  9. Cluster is happy

The amount of data in this cluster is very large, and this process can take
close to 24 hours. So I'd like very much to avoid that for routine restarts.

Thanks.
Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34bb65f7-a286-46f7-a9a1-5f4e72f06926%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Wed, Jul 02, 2014 at 05:43:26AM -0700, Andrew Davidoff wrote:

How can I avoid this, and make shards on a restarted node come back on the
same node?

Hello,

I have exactly the same issue.
My objective is to make a rolling restart script which wait for green
cluster state before restarting a node.
I use:

curl -XPUT -s $host:$port/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable": "new_primaries"}}'

to allow the cluster to work (and be able to create indices) during
restart.

But same issue: node is back up but nothing happen until I enable all
allocation again

I have gone through elasticsearch documentation related to recovery,
gateway, cluster settings without finding any parameters to activate or
configure this initial recovery of local indices.

--
Grégoire

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140702142754.GA2140%40criteo-scalasto.criteo.prod.
For more options, visit https://groups.google.com/d/optout.

Andrew,

Have you found a solution (or explaination) to your issue ?
We are using elasticsearch 1.1.1, what about you ?

--
Grégoire

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abb533aca6be4983a05269a35b72f7ba%40FRDCPWEXCH002.criteois.lan.
For more options, visit https://groups.google.com/d/optout.

On Mon, Jul 7, 2014 at 4:16 AM, Grégoire Seux g.seux@criteo.com wrote:

Andrew,

Have you found a solution (or explaination) to your issue ?
We are using elasticsearch 1.1.1, what about you ?

Hi,

I haven't learned anything new. To be clear about my problem, I am
aware that I must re-enable routing after having disabled it. My issue
is that I expect all the UNASSIGNED shards to go back to the same
node, but some do not, only to get rebalanced back there later. I am
running elasticsearch 1.2.1.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJLXCZSM9MKC0MN_aaivrFwmE6U-%3Dmkqobun946XcnPk2BCcHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I guess I'll ask about this once more for now. This happened again today. I
set allocation to new_primaries, restarted a node, set allocation back to
all, and the cluster is assigning across all nodes the shards that were on
the restarted node, and when it's done, which will probably take a day,
it'll likely rebalance by moving them back to the original node. I have to
assume I'm doing something wrong here. Am I?

Thanks for any advice.
Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Have you changed your gateway settings?

It still remains a bit of black magic to me. Sometimes it works, sometimes
it does not.

Cheers,

Ivan

On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff davidoff@qedmf.net wrote:

I guess I'll ask about this once more for now. This happened again today.
I set allocation to new_primaries, restarted a node, set allocation back to
all, and the cluster is assigning across all nodes the shards that were on
the restarted node, and when it's done, which will probably take a day,
it'll likely rebalance by moving them back to the original node. I have to
assume I'm doing something wrong here. Am I?

Thanks for any advice.
Andy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAewGBdDpXMBX1%2BZOMmsu%3D5JE1E7jJc3XFm1cNqHHYQGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On Tuesday, July 29, 2014 3:27:13 PM UTC-4, Ivan Brusic wrote:

Have you changed your gateway settings?
Elasticsearch Platform — Find real-time answers at scale | Elastic

It still remains a bit of black magic to me. Sometimes it works, sometimes
it does not.

Ivan,

I have read over that documentation several times and I don't understand
how it would help me. By which I mean I honestly don't understand how it
would help me - not that I am unwilling to try it. Those settings read like
they control when recovery would begin, but my problem isn't that recovery
is starting when I don't want it to, it's that when I start it (by setting
shard allocation about to "all") shards that I'd think would just stick
with the node they were previously on, get assigned to other nodes, then
ultimately get rebalanced back to the original node.

At this point I am finding that for quick restarts, just doing them with no
additional prep work allows me to recover in ~30m, vs ~24h. So for now I'm
just going to do that. Whatever I am doing wrong here just isn't at all
clear to me.

Thanks for your advice. If I have misunderstood the settings you pointed me
at and you think you can help me understand, I'd be grateful for more
information.

Andy

Cheers,

Ivan

On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff <davi...@qedmf.net
<javascript:>> wrote:

I guess I'll ask about this once more for now. This happened again today.
I set allocation to new_primaries, restarted a node, set allocation back to
all, and the cluster is assigning across all nodes the shards that were on
the restarted node, and when it's done, which will probably take a day,
it'll likely rebalance by moving them back to the original node. I have to
assume I'm doing something wrong here. Am I?

Thanks for any advice.
Andy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b8cf076f-2494-4b0e-b6c3-5a21d1c9f9a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The idea is that the cluster should be delayed when a cluster rebalance
occurs, but even with these settings, I often find that shards are moved
immediately.

Are you using the default stores throttling settings? I found them to be
quite low.

Cheers,

Ivan

On Wed, Jul 30, 2014 at 6:02 AM, Andrew Davidoff davidoff@qedmf.net wrote:

On Tuesday, July 29, 2014 3:27:13 PM UTC-4, Ivan Brusic wrote:

Have you changed your gateway settings? http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
current/modules-gateway.html#recover-after

It still remains a bit of black magic to me. Sometimes it works,
sometimes it does not.

Ivan,

I have read over that documentation several times and I don't understand
how it would help me. By which I mean I honestly don't understand how it
would help me - not that I am unwilling to try it. Those settings read like
they control when recovery would begin, but my problem isn't that recovery
is starting when I don't want it to, it's that when I start it (by setting
shard allocation about to "all") shards that I'd think would just stick
with the node they were previously on, get assigned to other nodes, then
ultimately get rebalanced back to the original node.

At this point I am finding that for quick restarts, just doing them with
no additional prep work allows me to recover in ~30m, vs ~24h. So for now
I'm just going to do that. Whatever I am doing wrong here just isn't at all
clear to me.

Thanks for your advice. If I have misunderstood the settings you pointed
me at and you think you can help me understand, I'd be grateful for more
information.

Andy

Cheers,

Ivan

On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff davi...@qedmf.net
wrote:

I guess I'll ask about this once more for now. This happened again
today. I set allocation to new_primaries, restarted a node, set allocation
back to all, and the cluster is assigning across all nodes the shards that
were on the restarted node, and when it's done, which will probably take a
day, it'll likely rebalance by moving them back to the original node. I
have to assume I'm doing something wrong here. Am I?

Thanks for any advice.
Andy

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b8cf076f-2494-4b0e-b6c3-5a21d1c9f9a1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b8cf076f-2494-4b0e-b6c3-5a21d1c9f9a1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBnsTUiXYZVyUgnuU6k7nN8NJiwh2UK_JPTOhEVa8BCRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I've seen this as well Ivan, and have also had a few people on IRC comment
on the same thing - shards that are local are not simply being initialised,
but being reallocated elsewhere.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 03:54, Ivan Brusic ivan@brusic.com wrote:

The idea is that the cluster should be delayed when a cluster rebalance
occurs, but even with these settings, I often find that shards are moved
immediately.

Are you using the default stores throttling settings? I found them to be
quite low.

Cheers,

Ivan

On Wed, Jul 30, 2014 at 6:02 AM, Andrew Davidoff davidoff@qedmf.net
wrote:

On Tuesday, July 29, 2014 3:27:13 PM UTC-4, Ivan Brusic wrote:

Have you changed your gateway settings? http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
current/modules-gateway.html#recover-after

It still remains a bit of black magic to me. Sometimes it works,
sometimes it does not.

Ivan,

I have read over that documentation several times and I don't understand
how it would help me. By which I mean I honestly don't understand how it
would help me - not that I am unwilling to try it. Those settings read like
they control when recovery would begin, but my problem isn't that recovery
is starting when I don't want it to, it's that when I start it (by setting
shard allocation about to "all") shards that I'd think would just stick
with the node they were previously on, get assigned to other nodes, then
ultimately get rebalanced back to the original node.

At this point I am finding that for quick restarts, just doing them with
no additional prep work allows me to recover in ~30m, vs ~24h. So for now
I'm just going to do that. Whatever I am doing wrong here just isn't at all
clear to me.

Thanks for your advice. If I have misunderstood the settings you pointed
me at and you think you can help me understand, I'd be grateful for more
information.

Andy

Cheers,

Ivan

On Mon, Jul 28, 2014 at 1:52 PM, Andrew Davidoff davi...@qedmf.net
wrote:

I guess I'll ask about this once more for now. This happened again
today. I set allocation to new_primaries, restarted a node, set allocation
back to all, and the cluster is assigning across all nodes the shards that
were on the restarted node, and when it's done, which will probably take a
day, it'll likely rebalance by moving them back to the original node. I
have to assume I'm doing something wrong here. Am I?

Thanks for any advice.
Andy

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fc6ffe88-1a01-452e-b971-2b8fc222cba3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b8cf076f-2494-4b0e-b6c3-5a21d1c9f9a1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b8cf076f-2494-4b0e-b6c3-5a21d1c9f9a1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBnsTUiXYZVyUgnuU6k7nN8NJiwh2UK_JPTOhEVa8BCRA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBnsTUiXYZVyUgnuU6k7nN8NJiwh2UK_JPTOhEVa8BCRA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZRRCJNbVBcaaJ_0dsSGou_xhx5g8wDGvdeZKV6XQ3qsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.