Red cluster after master node restart

We're in the process of reconfiguring our elasticsearch clusters to have separate client and master nodes. While adding the master nodes is intrusive, the rolling restart of data nodes after setting node.master to false turned out to be disruptive, causing a red cluster.

Our setup, outlined:

ES 1.7.5
4 client nodes
12 data nodes
3 master nodes

Our elasticsearch.yml has

discovery.zen.ping.unicast.hosts: ["esmaster1", "esmaster2", "esmaster3", "esclient1", "esclient2", "esclient3", "esclient4", "esdata1", "esdata2", "esdata3", "esdata4", "esdata5", "esdata6", "esdata7", "esdata7", "esdata8", "esdata9", "esdata10", "esdata11", "esdata12"]

Before the reconfig, discovery.zen.minimum_master_nodes was set to 6 int((number of nodes / 2) + 1). After the reconfig, we set this to 2 based on 3 master nodes.

The rolling restart was done as follows for each node:

  1. stop all indexers
  2. disable shard allocations
  3. shut down elasticsearch on node using api
  4. restart elasticsearch service
  5. enable shard allocations
  6. wait until cluster is green
  7. start indexers

Once indexers were caught up with realtime, I would proceed with the next node. This went according to plan until the last data node was restarted, which was the master at the time (I saved it for last, perhaps I shouldn't have ?)

These graphs show exactly the timeframe we were impacted:

[continuing in a follow-up post as I have exceeded my 5000 character max]

[continuing from previous message]

After the master node was restarted, there's a flood of messages like this in the logs, for each host in the cluster:

[2016-03-25 17:09:47,921][DEBUG][transport.netty          ] [esdata1] connected to node [[#zen_unicast_521_5p2BpJWNQQ681cPF2f9duw#][esdata7][inet[/10.112.34.163:9300]]{master=false}]
[2016-03-25 17:09:49,423][DEBUG][transport.netty          ] [esata1] connected to node [[#zen_unicast_534_poJN16rLTMqvMRy9ZFZBoA#][esmaster3][inet[/10.112.34.196:9300]]{data=false}]
[2016-03-25 17:09:49,424][DEBUG][transport.netty          ] [esdata1] connected to node [[#zen_unicast_547__KZq2UIwRRGM383K1UWzGg#][esclient4][inet[/10.112.34.193:9300]]{data=false, master=false}]
[2016-03-25 17:09:50,469][DEBUG][action.admin.cluster.health] [esdata1] no known master node, scheduling a retry
[2016-03-25 17:09:50,931][DEBUG][transport.netty          ] [esdata1] disconnecting from [[#zen_unicast_544_NX9W8hKqRzOhPa92RNem0A#][esclient3][inet[/10.112.34.175:9300]]{data=false, master=false}] due to explicit disconnect call
[2016-03-25 17:09:50,932][DEBUG][transport.netty          ] [esata1] disconnecting from [[#zen_unicast_13#][esdata1][inet[esdata6/10.112.34.162:9300]]] due to explicit disconnect call

This goes on for a while, until something starts happening:

[2016-03-25 17:10:17,695][INFO ][discovery.zen            ] [esdata1] failed to send join request to master [[esmaster2][2SDaJlgRS1GcTyzmlIDI9g][esmaster2][inet[/10.112.34.195:9300]]{data=false}], reason [RemoteTransportException[[esmaster2][inet[/10.112.34.195:9300]][internal:discovery/zen/join]]
; nested: ElasticsearchIllegalStateException[Node [[esmaster2][2SDaJlgRS1GcTyzmlIDI9g][esmaster2.][inet[/10.112.34.195:9300]]{data=false}]
not master for join request from [[esdata1][3j0Njy7xTRyr6bmua4fZdA][esdata1.][inet[/10.112.34.157:9300]]{master=false}]]; ], tried [3] times
[2016-03-25 17:10:17,695][DEBUG][cluster.service          ] [esdata1] processing [finalize_join ([esmaster2][2SDaJlgRS1GcTyzmlIDI9g][esmaster2][inet[/10.112.34.195:9300]]{data=false})]: execute
[2016-03-25 17:10:17,695][DEBUG][cluster.service          ] [esdata1] processing [finalize_join ([esmaster2][2SDaJlgRS1GcTyzmlIDI9g][esmaster2][inet[/10.112.34.195:9300]]{data=false})]: took 0s no change in cluster_state

This goes on for ~ 7 minutes, until finally

[2016-03-25 17:17:12,543][DEBUG][discovery.zen            ] [esdn1] filtered ping responses: (filter_client[true], filter_data[false])
[2016-03-25 17:17:13,083][DEBUG][discovery.zen.fd         ] [esdata1] [master] restarting fault detection against master [[esmaster1][gi7qMafJS4OXHk9ujvAlIQ][esmaster1][inet[/10.112.34.194:9300]]{data=false}], reason [new cluster state received and we are monitoring the wrong master [null]]
[2016-03-25 17:17:13,084][DEBUG][discovery.zen            ] [esdata1] got first state from fresh master [gi7qMafJS4OXHk9ujvAlIQ]
[2016-03-25 17:17:13,085][DEBUG][cluster.service          ] [esdata1] cluster state updated, version [54], source [zen-disco-receive(from master [[esmaster1][gi7qMafJS4OXHk9ujvAlIQ][esmaster1][inet[/10.112.34.194:9300]]{data=false}])]

Continuing

[2016-03-25 17:17:13,086][INFO ][cluster.service ] [esdata1] detected_master [esmaster1][gi7qMafJS4OXHk9ujvAlIQ][esmaster1][inet[/10.112.34.194:9300]]{data=false}, added {[esdata5][dmiJ7MqlR_SaYck7a96cfg][esdata5][inet[/10.112.34.161:9300]]{master=false},[esdata7][5p2BpJWNQQ681cPF2f9duw][esdata7][inet[/10.112.34.163:9300]]{master=false},[esdata11][s9BHm86oQOaBh18j8HZCOA][esdata11][inet[/10.112.34.181:9300]]{master=false},[esmaster2][y6WZF9E5Rs-uaeBcojYWHQ][esmaster2][inet[/10.112.34.195:9300]]{data=false},[esdata12][Y6mI96BZQV6__5NG7lkJAA][esdata12][inet[/10.112.34.182:9300]]{master=false},[esdata9][4fYYgFa7S0KtXUKP9spbxw][esdata9][inet[/10.112.34.184:9300]]{master=false},[esmaster3][2YYaoapkQoabN1UgEUGHMQ][esmaster3][inet[/10.112.34.196:9300]]{data=false},[esdata6][Bcobu5jhSXmWUWdcqdB2vg][esdata6][inet[/10.112.34.162:9300]]{master=false},[esdata8][FlMTzCkQSIeux0bD83eCeg][esdata8][inet[/10.112.34.183:9300]]{master=false},[esclient4][_KZq2UIwRRGM383K1UWzGg][esclient4][inet[/10.112.34.193:9300]]{data=false, master=false},[esdata4][0US-i58rQH2H4aLsraG1Gw][esdata4][inet[/10.112.34.160:9300]]{master=false},[esclient3][NX9W8hKqRzOhPa92RNem0A][esclient3][inet[/10.112.34.175:9300]]{data=false, master=false},[esmaster1][gi7qMafJS4OXHk9ujvAlIQ][esmaster1][inet[/10.112.34.194:9300]]{data=false},[esclient1][-sbH82UhT3Sb0AoaDGkTNw][esclient1][inet[/10.112.34.173:9300]]{data=false, master=false},[esdata3][X2ZLf1eaT2KwEwV5YGeB3A][esdata3][inet[/10.112.34.159:9300]]{master=false},[esdata2][H2WPfE12Rk6GWXios2gitA][esdata2][inet[/10.112.34.158:9300]]{master=false},[esdata10][6kAHYTRGSE6KNBZOxTNyfQ][esdata10][inet[/10.112.34.180:9300]]{master=false},[esclient2][EmgQ8UwkQeKNnPmzUDeTOg][esclient2][inet[/10.112.34.174:9300]]{data=false, master=false},}, reason: zen-disco-receive(from master [[esmaster1][gi7qMafJS4OXHk9ujvAlIQ][esmaster1][inet[/10.112.34.194:9300]]{data=false}])

We have to go through the same exercise in several clusters. What needs to be improved with this procedure in order to allow the cluster to survive the original master node restart ? Should I update discovery.zen.unicast.host to only contain the new dedicated master nodes ?

Definitely.