Cluster turning into green state takes long time

AKSHAY_ARDAK · October 6, 2019, 7:31am

Hi Folks,

Currently facing issue in the elasticsearch cluster
During the upgrade/restart of the nodes in the cluster ,it takes long time to turn the cluster into green state from yellow.

I am following the below steps for restart

Disable shard allocation
restart the node
enable shard allocation

after following the above steps it takes very long time to turn the cluster to green state from yellow.

Version of ES : 5.X

Please advise .

Christian_Dahlqvist · October 6, 2019, 7:39am

How large is the cluster? How much data is there in it? How many indices and shards do you have?

DavidTurner · October 7, 2019, 5:38am

For the best recovery time you should follow the instructions for a rolling upgrade (except the upgrade bit obviously, but including all the optional steps).

"5.x" is not a useful version number. It is always better to share the full version number. Things changed a lot between 5.0.0 and 5.6.16. One thing they all have in common is that they are long past the ends of their supported lives and you should definitely upgrade as soon as possible. In particular there are improvements to recovery speed in later versions.

AKSHAY_ARDAK · October 7, 2019, 10:32am

@Christian_Dahlqvist Cluster is large with 20 data nodes and each has total disk = 1.7 TB out of which ~500GB is used.
Each has 20 shards on them.

AKSHAY_ARDAK · October 7, 2019, 10:33am

@DavidTurner Current version is 5.1.1-1

NewmazN24 · October 7, 2019, 3:01pm

I am not an expert, but if you have 20 nodes with lots of data it will take time to connect. Also, ES will take approx 20 seconds to start properly on a new cluster so if you have lots of data associated, it may take some time.

rugenl · October 7, 2019, 6:02pm

What does Kibana monitoring overview shard activity show during the recovery time? Do all indices have at least 1 replica? Any force awareness?

When the node is stopped, all replicas there are "lost", for all that were primaries, a replica will be promoted to primary. When the node is restarted, recovery will start for the missing shards, I think this is where newer versions have changed how fast this happens. For indices that haven't changed during the outage interval, recovery on current versions is pretty fast. For indices with a lot of changes, recovery is slow. Unless you set prioritizes on indices, this is pretty random. There are limits on concurrent shard recoveries, I think the default is 2. If 2 indices with a lot of changes start first, the rest will wait behind them.

If you enable shard allocation too soon, will the cluster start doing recoveries to other nodes? I suspect it will.

DavidTurner · October 7, 2019, 8:32pm

"Lost" is putting it a bit strongly. The data remains on disk and will be re-used in a recovery if possible, saving a good deal of time, particularly if you follow the complete instructions for a rolling upgrade that I mentioned above.

Newer versions expand the scenarios under which Elasticsearch can re-use any existing data, and work continues on this front today, but even in versions as old as 5.1 most unchanged shards should recover pretty much instantly AFAIK.

rugenl · October 7, 2019, 8:59pm

Yea, my mind knew what I was trying to say, it just didn't make it to my fingers

system · November 4, 2019, 8:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why does it take time for an Elasticsearch node to go "green" after being restarted? Elasticsearch	1	632	July 6, 2017
Restarting many nodes Elasticsearch	3	278	July 19, 2018
Elasticsearch node takes a time to come in yellow or green state after start Elasticsearch	2	621	January 22, 2019
Shard re-allocation taking a very long time Elasticsearch	16	7532	April 15, 2019
Very slow cluster restart Elasticsearch	4	4527	July 6, 2017

Cluster turning into green state takes long time

Related topics