Why does it take time for an Elasticsearch node to go "green" after being restarted?

shaunak · April 30, 2015, 11:32pm

Elasticsearch (ES) keeps checksums of each shard to make sure once a shard has been copied to a different node that the shard copy is valid.

However shard copies do diverge file wise, since after a shard copy is started, each shard merges segments independently, but for shard relocation that is fine.

After a full restart ES and the primary shards are started, the checksum of the replica shards is different then on the primary shard (because of the explanation mentioned above) and then instead of reusing the replica shard, it will make a copy of the primary shard and use that. This obviously takes time and that is why getting from yellow state to green state can take a long time.

Our plans are to improve this in a future release, so that ES can safely use the shard replica and shard copying doesn't need to occur for replica shards to get in a started state.

In the mean time you can increase the following setting the speed this up:

indices.recovery.max_bytes_per_sec (defaults to 20mb/s)
indices.recovery.concurrent_streams (defaults to 3)

These setting define the limits on a node level.

Topic		Replies	Views
Cluster turning into green state takes long time Elasticsearch	9	1494	November 4, 2019
Why does a restart performs recovery which takes long time (6-12hrs)? Elasticsearch	3	2704	January 23, 2019
Shard allocation on restarted node takes too long Elasticsearch	5	3356	July 5, 2017
Restarting node takes time Elasticsearch	4	1079	July 5, 2017
Slow initialisation time after restart Elasticsearch	11	2093	June 1, 2017

Why does it take time for an Elasticsearch node to go "green" after being restarted?

Related topics