Noob Question: Why is restarting a node anything other than instantaneous?

Why is restarting a node anything other than instantaneous?

That is, my understanding is that the default configuration sets up a local
gateway such that the node has a copy of all the shards it had before it
was restarted.

But the behavior I see is that if I restart a node, the node has to recover
each shard in turn, which takes awhile.

I get that it would have to replay the operation log against the shard, but
it seems like I get this behavior even when we're not actively writing to
the cluster.

Is there some config thing I missed? Is there something I should check?

Pierce

(Very happy with ElasticSearch, it made our eReader search at Chegg much
faster.)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

if I understand you correctly, you are asking why your node replays
transaction log upon restart. Is that correct?
This is because your translog is not empty, you can empty it using flush
[1] operation.

[1]

Regards,
Lukas

On Fri, Oct 11, 2013 at 8:40 PM, Pierce Wetter obastard@gmail.com wrote:

Why is restarting a node anything other than instantaneous?

That is, my understanding is that the default configuration sets up a
local gateway such that the node has a copy of all the shards it had before
it was restarted.

But the behavior I see is that if I restart a node, the node has to
recover each shard in turn, which takes awhile.

I get that it would have to replay the operation log against the shard,
but it seems like I get this behavior even when we're not actively writing
to the cluster.

Is there some config thing I missed? Is there something I should check?

Pierce

(Very happy with Elasticsearch, it made our eReader search at Chegg much
faster.)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No, it was more that I was worried that there was some gotcha I'd missed
where I hadn't setup the local gateway or something because its not on by
default, even though I think it IS on by default.

But from what you just said, I'm gathering that before I bounce nodes, I
should explicitly flush, so that each node marks its existing position in
the transaction log. Otherwise I have to rely on the automatic flush, which
might have been awhile. Since I haven't been doing that, nodes have to
replay the translog since the last flush.

Did I read between the lines correctly?

Pierce

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you're restarting nodes in a cluster use this - curl -XPUT
eshost.domain.com:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation": true}}';
Then when the node restarts it won't need to reallocate shards, it'll use
the ones that it currently hosts and reinitialise them, this cuts down on
recovery time dramatically.

When the node is restarted, just send a post setting it to false.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 12 October 2013 10:01, Pierce Wetter obastard@gmail.com wrote:

No, it was more that I was worried that there was some gotcha I'd missed
where I hadn't setup the local gateway or something because its not on by
default, even though I think it IS on by default.

But from what you just said, I'm gathering that before I bounce nodes, I
should explicitly flush, so that each node marks its existing position in
the transaction log. Otherwise I have to rely on the automatic flush, which
might have been awhile. Since I haven't been doing that, nodes have to
replay the translog since the last flush.

Did I read between the lines correctly?

Pierce

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ah, the plot thickens.

So you're saying that when I naively restart a node (like say, I upgraded
ES). It's ignoring its existing shards, and reallocating the shards from
other nodes in the cluster. That why I have to wait while they bootstrap
from other nodes. Because basically, its coming up and thinking "I have
zero shards I should get one".

That is:

I shutdown a node.

The cluster returns the shards to the allocation pool. Basically, its
recorded that the node has zero shards.

I start the node. Node talks to cluster.

Cluster says "Welcome, you have no shards yet. ". Node says "ok". The
cluster says "hey, you can have one of these shards".

Node starts copying shards from elsewhere in the cluster.

There's no sort of optimization that says "I'm node 6, I want shard 8,
oh, wait, I already have shard 8. Done!".

Is that right?

Your faster way:

 I disable allocation:

 I shutdown a node. 

 The cluster doesn't do anything, nodes have whatever shards they have. 

  I start the node. Node talks to cluster. Cluster says "Welcome. ". 

Node says "ok, BTW, I have shards 5,6,7,8,9 in this state". The cluster
says: "Great, you need to catch up shards 8,9".

Node catches up 8, 9 by getting a translog delta from one of the other
replicas.

Node now has shards 5,6,7,8,9 caught up.

I turn on allocation.

Everything is done, cluster is whole.

Is that closer?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

1 Like

Yep, that's the concept.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 12 October 2013 10:33, Pierce Wetter obastard@gmail.com wrote:

Ah, the plot thickens.

So you're saying that when I naively restart a node (like say, I upgraded
ES). It's ignoring its existing shards, and reallocating the shards from
other nodes in the cluster. That why I have to wait while they bootstrap
from other nodes. Because basically, its coming up and thinking "I have
zero shards I should get one".

That is:

I shutdown a node.

The cluster returns the shards to the allocation pool. Basically, its
recorded that the node has zero shards.

I start the node. Node talks to cluster.

Cluster says "Welcome, you have no shards yet. ". Node says "ok". The
cluster says "hey, you can have one of these shards".

Node starts copying shards from elsewhere in the cluster.

There's no sort of optimization that says "I'm node 6, I want shard 8,
oh, wait, I already have shard 8. Done!".

Is that right?

Your faster way:

 I disable allocation:

 I shutdown a node.

 The cluster doesn't do anything, nodes have whatever shards they

have.

  I start the node. Node talks to cluster. Cluster says "Welcome. ".

Node says "ok, BTW, I have shards 5,6,7,8,9 in this state". The cluster
says: "Great, you need to catch up shards 8,9".

Node catches up 8, 9 by getting a translog delta from one of the other
replicas.

Node now has shards 5,6,7,8,9 caught up.

I turn on allocation.

Everything is done, cluster is whole.

Is that closer?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ah, and kopf has a "lock/unlock" button to do just that.

Ok, trying that...

Hmmm.... Didn't quite work for me. Of course, I also did some other stuff
like switch the primary and change the number of replicas at the same time,
but basically, the restarting nodes came up with zero shards.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.