Rolling restart elasticsearch cluster

abhijat · December 4, 2015, 7:51pm

I have an index with 7 primaries and 2 replicas. Each shard is on its own machine. So, there are 21 data nodes.

When I have to do rolling restart, currently, I check if shard relocation is complete before I switch over to restart another node.

http://$MARVEL_HOST/_cluster/health?wait_for_status=green&wait_for_relocating_shards=0&timeout=15m

Do I need to wait for shard allocation to complete or should I simply check if cluster status is green and then move over to the next node?

In addition, do I also need to worry about initializing shards before restarting another node?

Say, I only wait for cluster to become green and try to restart a node that has a primary and its replica shard is still in the process of relocating. Will I run into any issues? How does elasticsearch handle such a scenario?

I am using version 1.7.0.

magnusbaeck · December 5, 2015, 7:08pm

When I have to do rolling restart, currently, I check if shard relocation is complete before I switch over to restart another node.

Why are shards being reallocated? The recommendation is to disable allocations while a node is being restarted and enable it once it's up again, so it should return to green very quickly without any additional reallocations.

Do I need to wait for shard allocation to complete or should I simply check if cluster status is green and then move over to the next node?

Since you have two replicas of each shard you don't even have to wait for the cluster to become green since you can handle two unavailable nodes at a time. But sure, taking them down one at a time and only when the cluster is green means you could suffer an unplanned node loss during the restart without venturing data availability.

abhijat · December 5, 2015, 9:13pm

Why are shards being reallocated? The recommendation is to disable allocations while a node is being restarted and enable it once it's up again, so it should return to green very quickly without any additional reallocations.

Right... before I shutdown, I do disable shard allocation by executing the command:

curl -s -S -XPUT http://$MARVEL_HOST/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable":"none"}}'

and once the node has restarted, it allocation is enabled again by executing the command:

curl -s -S -XPUT http://$MARVEL_HOST/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable":"all"}}'

Once, allocation is enabled, I see that shards start getting re-allocated. I have seen that cluster goes green but the shards are still being relocated to different machines. And that's where my question is: Do I need to wait for the shard relocation to complete before I move onto the next node or not?

I am following rolling restart guidelines as specified here: Rolling Restarts | Elasticsearch: The Definitive Guide [2.x] | Elastic.

magnusbaeck · December 6, 2015, 12:59pm

Once, allocation is enabled, I see that shards start getting re-allocated. I have seen that cluster goes green but the shards are still being relocated to different machines.

So the reallocation starts after the cluster goes green when you've restarted a single node? That's not what I would've expected—that it thinks there's a need to reallocate, that is. Unless things have happened while the node was down the previous shard equilibrium should remain.

And that's where my question is: Do I need to wait for the shard relocation to complete before I move onto the next node or not?

You don't have to wait.

abhijat · December 6, 2015, 6:34pm

So the reallocation starts after the cluster goes green when you've restarted a single node? That's not what I would've expected—that it thinks there's a need to reallocate, that is. Unless things have happened while the node was down the previous shard equilibrium should remain.

I think I understand now. Yes, when the node was down, indexing was still going on. Thus, when the node restarts, the shards get re-allocated on enabling reallocation setting.

I have made appropriate changes that confirms that cluster is green before restarting the next node.

Thank you for your help, Magnus.

Topic		Replies	Views
Rolling restart Elasticsearch	5	400	July 6, 2017
Shard allocation on restarted node takes too long Elasticsearch	5	3394	July 5, 2017
Shard reallocation after rolling restart Elasticsearch	3	934	June 30, 2017
Restarting many nodes Elasticsearch	3	280	July 19, 2018
Stop-start an elasticsearch instance having all the primary shards Elasticsearch	14	988	March 19, 2020

Rolling restart elasticsearch cluster

Related topics