Hi!
On my Elasticsearch 5.3.2 cluster it takes nearly 2min to elect a new master. My screnario is a rolling restart. The master is restarted last. Once it gets restarted the cluster "panics" and searches for a new master. This takes about 2min.
The logfiles repeats a MasterNotDiscoveredException until the new one is found.
I added the logfiles of my restart script, it should be clear what is going on (I hope):
2017-10-16 15:54:13,874 INFO Restarting loc2elastic2-test-work2-elk-awtest1 (5 of 5 nodes)
2017-10-16 15:54:13,976 INFO Changing shard allocation to none resulted in: {"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"enable":"none"}}}}}
2017-10-16 15:54:13,976 INFO Restarting ES Node: loc2elastic2-test-work2-elk-awtest1
2017-10-16 15:54:14,140 INFO Connected (version 2.0, client OpenSSH_6.6.1p1)
2017-10-16 15:54:14,385 INFO Authentication (publickey) successful!
2017-10-16 15:54:14,723 INFO detected service name: elasticsearch-test-work2-elk-awtest1
2017-10-16 15:54:18,206 INFO * Stopping Elasticsearch Server test-work2-elk-awtest1
...done.
* Starting Elasticsearch Server test-work2-elk-awtest1
[2017-10-16T13:54:17,743][WARN ][o.e.c.l.LogConfigurator ] ignoring unsupported logging configuration file [/etc/elasticsearch/test-work2-elk-awtest1/logging.yml], logging is configured via [/etc/elasticsearch/test-work2-elk-awtest1/log4j2.properties]
...done.
2017-10-16 15:54:18,299 INFO Server loc2elastic2-test-work2-elk-awtest1 has not joined the cluster yet. Waiting 5 more seconds.
2017-10-16 15:54:23,393 INFO Server loc2elastic2-test-work2-elk-awtest1 has not joined the cluster yet. Waiting 5 more seconds.
2017-10-16 15:54:28,612 INFO Server loc2elastic2-test-work2-elk-awtest1 has joined the cluster.
2017-10-16 15:54:58,710 INFO Changing shard allocation to all resulted in: {"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
2017-10-16 15:54:58,710 INFO Retrying...
2017-10-16 15:55:28,819 INFO Changing shard allocation to all resulted in: {"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
2017-10-16 15:55:28,819 INFO Retrying...
2017-10-16 15:55:58,913 INFO Changing shard allocation to all resulted in: {"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
2017-10-16 15:55:58,913 INFO Retrying...
2017-10-16 15:56:21,683 INFO Changing shard allocation to all resulted in: {"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"enable":"all"}}}}}
2017-10-16 15:56:21,879 INFO Waiting for green, current cluster state is: yellow
2017-10-16 15:56:26,981 INFO Waiting for green, current cluster state is: yellow
2017-10-16 15:56:32,162 INFO Waiting for green, current cluster state is: green
Any idea if I have configured something wrong?
Take care,
Alex