Restarting many nodes

Peter_Tudt · June 21, 2018, 10:53am

Hi team, my first post here
Working as Sys Admin = do everything

The scenario:
ES 5.6.0
6 data nodes
Adding 3 master nodes and 2 client nodes (coordinating nodes).

shards size up to 130 GB and cluster about 800GB.
SSD disks in raid0, network 1Gbit.

changed elasticsearch.yml on data nodes.

Restarted node 1 - cluster went to yellow - initializing and unassigned shards.
After about 1 hour the cluster was green again.

Found on forum suggestion to use:
cluster.routing.allocation.enable "none"
restart node 2
cluster.routing.allocation.enable "all"

That did not help, initiating is still ongoing.

I still have 4 more data nodes to go, after that one more cluster. That will take me about 12 hours!

Thanks for advice?

A_B · June 21, 2018, 11:06am

Hi, might be a silly question, but did you wait for the restarted node to rejoin the cluster and be available before you cluster.routing.allocation.enable "all"?

I have had no major problems doing rolling restarts on my cluster with 20 nodes and quite a few TB of data. Largest shards have been 200GB.

My cluster is "rack aware" so I set cluster.routing.allocation.enable "none", restart ES on all nodes in one "rack", wait until they are all showing up again in the list of nodes and then cluster.routing.allocation.enable "all". It usually takes less than a minute for the cluster to go from yellow back to green.

Not sure if introducing new nodes causes some sort of re-balancing of shards....

Peter_Tudt · June 21, 2018, 12:13pm

Thank for advice A_B
Restarted node3 10 min ago.
After node3 joined cluster waited extra 3 min.

Actually INITIALIZING is taking time.

The replica shards on node3 are all started.
The primary shards from node3 are now INITIALIZING as replica on node3 (two at the time). During the process network speed is max on node3 coming from node that now have primary shards.

system · July 19, 2018, 12:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Restarting node takes time Elasticsearch	4	1079	July 5, 2017
Shard allocation on restarted node takes too long Elasticsearch	5	3356	July 5, 2017
Restarting a cluster node Elasticsearch	2	270	November 16, 2020
Elasticsearch rolling restart recovery is slow Elasticsearch	3	1239	January 10, 2020
Very slow cluster restart Elasticsearch	4	4527	July 6, 2017

Restarting many nodes

Related topics