Currently facing issue in the elasticsearch cluster
During the upgrade/restart of the nodes in the cluster ,it takes long time to turn the cluster into green state from yellow.
I am following the below steps for restart
Disable shard allocation
restart the node
enable shard allocation
after following the above steps it takes very long time to turn the cluster to green state from yellow.
For the best recovery time you should follow the instructions for a rolling upgrade (except the upgrade bit obviously, but including all the optional steps).
"5.x" is not a useful version number. It is always better to share the full version number. Things changed a lot between 5.0.0 and 5.6.16. One thing they all have in common is that they are long past the ends of their supported lives and you should definitely upgrade as soon as possible. In particular there are improvements to recovery speed in later versions.
I am not an expert, but if you have 20 nodes with lots of data it will take time to connect. Also, ES will take approx 20 seconds to start properly on a new cluster so if you have lots of data associated, it may take some time.
What does Kibana monitoring overview shard activity show during the recovery time? Do all indices have at least 1 replica? Any force awareness?
When the node is stopped, all replicas there are "lost", for all that were primaries, a replica will be promoted to primary. When the node is restarted, recovery will start for the missing shards, I think this is where newer versions have changed how fast this happens. For indices that haven't changed during the outage interval, recovery on current versions is pretty fast. For indices with a lot of changes, recovery is slow. Unless you set prioritizes on indices, this is pretty random. There are limits on concurrent shard recoveries, I think the default is 2. If 2 indices with a lot of changes start first, the rest will wait behind them.
If you enable shard allocation too soon, will the cluster start doing recoveries to other nodes? I suspect it will.
"Lost" is putting it a bit strongly. The data remains on disk and will be re-used in a recovery if possible, saving a good deal of time, particularly if you follow the complete instructions for a rolling upgrade that I mentioned above.
Newer versions expand the scenarios under which Elasticsearch can re-use any existing data, and work continues on this front today, but even in versions as old as 5.1 most unchanged shards should recover pretty much instantly AFAIK.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.