Elasticsearch rolling restart problem

Hi,
i have one ELS 1.1.2 cluster with 7 nodes.
800GB data.

When i shutdown a node for various reasons, ELS automatically rebalance the
missing shard on the other node.

To prevent this, I tried this (specified in the official doc) :
"transient" : {
"cluster.routing.allocation.enable" : "none" }

ans then i issue a node shtudown.

Effectively, the relevant shards are now unassigned and ELS don't try to
reallocate them.

But when i restart the node, they still remain as "unassigned".
And then when i set back :
"transient" : {
"cluster.routing.allocation.enable" : "all" }

=> ELS reallocate unassigned shard to ALL nodes instead of the restarted
node.

What's wrong ?
What's the correct procedure ?

regards
jean

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reallocation to all nodes is the expected behavior.

Jörg

On Mon, Nov 10, 2014 at 3:55 PM, lagarutte via elasticsearch <
elasticsearch@googlegroups.com> wrote:

Hi,
i have one ELS 1.1.2 cluster with 7 nodes.
800GB data.

When i shutdown a node for various reasons, ELS automatically rebalance
the missing shard on the other node.

To prevent this, I tried this (specified in the official doc) :
"transient" : {
"cluster.routing.allocation.enable" : "none" }

ans then i issue a node shtudown.

Effectively, the relevant shards are now unassigned and ELS don't try to
reallocate them.

But when i restart the node, they still remain as "unassigned".
And then when i set back :
"transient" : {
"cluster.routing.allocation.enable" : "all" }

=> ELS reallocate unassigned shard to ALL nodes instead of the restarted
node.

What's wrong ?
What's the correct procedure ?

regards
jean

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY_%3D1JZzE%3Dmxov0dF1QeBF2NNDtXYcDj9%3D88Bu5gjvRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You've followed the right procedure. The problem is that Elasticsearch
doesn't always restore the shards back on the node that they came from. If
the restarted shard and the current master shard have diverge at all it'll
have to sync files somewhere to make sure that the restarted shard gets
all the changes. Since shards diverge all the time even if there aren't
updates while the node is down you can expect this.

Speeding this process up has been an open issue for many many months.

Nik

On Mon, Nov 10, 2014 at 11:49 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Reallocation to all nodes is the expected behavior.

Jörg

On Mon, Nov 10, 2014 at 3:55 PM, lagarutte via elasticsearch <
elasticsearch@googlegroups.com> wrote:

Hi,
i have one ELS 1.1.2 cluster with 7 nodes.
800GB data.

When i shutdown a node for various reasons, ELS automatically rebalance
the missing shard on the other node.

To prevent this, I tried this (specified in the official doc) :
"transient" : {
"cluster.routing.allocation.enable" : "none" }

ans then i issue a node shtudown.

Effectively, the relevant shards are now unassigned and ELS don't try to
reallocate them.

But when i restart the node, they still remain as "unassigned".
And then when i set back :
"transient" : {
"cluster.routing.allocation.enable" : "all" }

=> ELS reallocate unassigned shard to ALL nodes instead of the restarted
node.

What's wrong ?
What's the correct procedure ?

regards
jean

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY_%3D1JZzE%3Dmxov0dF1QeBF2NNDtXYcDj9%3D88Bu5gjvRg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY_%3D1JZzE%3Dmxov0dF1QeBF2NNDtXYcDj9%3D88Bu5gjvRg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3XafuJS4o7r_KkTndz1T5fdtgVnQWGMYFJM7Sab28kYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ok thank for your explanation.
It's a major concern as ELS is scalabe and when a node goes down, we have a
rebalancing process which can take lot ot time.
i find it strange that this point has not been adressed long time ago

I think with a big cluster (>100 nodes) then the cluster is permanently
rebalacing (consuming network and performance) as nodes crash frequently.

Is it the same if i put the index in read only mode ?

Le lundi 10 novembre 2014 17:58:19 UTC+1, Nikolas Everett a écrit :

You've followed the right procedure. The problem is that Elasticsearch
doesn't always restore the shards back on the node that they came from. If
the restarted shard and the current master shard have diverge at all it'll
have to sync files somewhere to make sure that the restarted shard gets
all the changes. Since shards diverge all the time even if there aren't
updates while the node is down you can expect this.

Speeding this process up has been an open issue for many many months.

Nik

On Mon, Nov 10, 2014 at 11:49 AM, joerg...@gmail.com <javascript:> <
joerg...@gmail.com <javascript:>> wrote:

Reallocation to all nodes is the expected behavior.

Jörg

On Mon, Nov 10, 2014 at 3:55 PM, lagarutte via elasticsearch <
elasti...@googlegroups.com <javascript:>> wrote:

Hi,
i have one ELS 1.1.2 cluster with 7 nodes.
800GB data.

When i shutdown a node for various reasons, ELS automatically rebalance
the missing shard on the other node.

To prevent this, I tried this (specified in the official doc) :
"transient" : {
"cluster.routing.allocation.enable" : "none" }

ans then i issue a node shtudown.

Effectively, the relevant shards are now unassigned and ELS don't try to
reallocate them.

But when i restart the node, they still remain as "unassigned".
And then when i set back :
"transient" : {
"cluster.routing.allocation.enable" : "all" }

=> ELS reallocate unassigned shard to ALL nodes instead of the
restarted node.

What's wrong ?
What's the correct procedure ?

regards
jean

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY_%3D1JZzE%3Dmxov0dF1QeBF2NNDtXYcDj9%3D88Bu5gjvRg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY_%3D1JZzE%3Dmxov0dF1QeBF2NNDtXYcDj9%3D88Bu5gjvRg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0d9f1254-560d-4124-8075-b5d6679be4a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I'm not sure if putting the cluster in readonly mode will help. I can't do
that with my system so I can't test it.

I'd be much happier if it only took a minute or two to perform a restart
on each node rather than the hours it can take.

Nik

On Mon, Nov 10, 2014 at 2:02 PM, lagarutte via elasticsearch <
elasticsearch@googlegroups.com> wrote:

ok thank for your explanation.
It's a major concern as ELS is scalabe and when a node goes down, we have
a rebalancing process which can take lot ot time.
i find it strange that this point has not been adressed long time ago

I think with a big cluster (>100 nodes) then the cluster is permanently
rebalacing (consuming network and performance) as nodes crash frequently.

Is it the same if i put the index in read only mode ?

Le lundi 10 novembre 2014 17:58:19 UTC+1, Nikolas Everett a écrit :

You've followed the right procedure. The problem is that Elasticsearch
doesn't always restore the shards back on the node that they came from. If
the restarted shard and the current master shard have diverge at all it'll
have to sync files somewhere to make sure that the restarted shard gets
all the changes. Since shards diverge all the time even if there aren't
updates while the node is down you can expect this.

Speeding this process up has been an open issue for many many months.

Nik

On Mon, Nov 10, 2014 at 11:49 AM, joerg...@gmail.com joerg...@gmail.com
wrote:

Reallocation to all nodes is the expected behavior.

Jörg

On Mon, Nov 10, 2014 at 3:55 PM, lagarutte via elasticsearch <
elasti...@googlegroups.com> wrote:

Hi,
i have one ELS 1.1.2 cluster with 7 nodes.
800GB data.

When i shutdown a node for various reasons, ELS automatically rebalance
the missing shard on the other node.

To prevent this, I tried this (specified in the official doc) :
"transient" : {
"cluster.routing.allocation.enable" : "none" }

ans then i issue a node shtudown.

Effectively, the relevant shards are now unassigned and ELS don't try
to reallocate them.

But when i restart the node, they still remain as "unassigned".
And then when i set back :
"transient" : {
"cluster.routing.allocation.enable" : "all" }

=> ELS reallocate unassigned shard to ALL nodes instead of the
restarted node.

What's wrong ?
What's the correct procedure ?

regards
jean

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cc317335-9412-42dc-b549-74eb91ba9d6b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAKdsXoFY_%3D1JZzE%3Dmxov0dF1QeBF2NNDtXYcDj9%
3D88Bu5gjvRg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY_%3D1JZzE%3Dmxov0dF1QeBF2NNDtXYcDj9%3D88Bu5gjvRg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0d9f1254-560d-4124-8075-b5d6679be4a3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0d9f1254-560d-4124-8075-b5d6679be4a3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2rugu45LJmEzBL%3Dw_2xrWDUPb5bPBfvD13b-CsFd6xUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.