We recently upgraded to 2.4.2 and are now seeing nodes mysteriously leaving and coming back. That is the cluster becomes yellow for a while and then goes back to green.
In the log on the master node I see this:
[2016-12-21 07:20:41,821][WARN ][cluster.action.shard ] [es-150e.foo.bar] [reference_2015-04-01_2][5] received shard failed for target shard [[reference_2015-04-01_2][5], node[-wygWN2DQG6FibS7MoW25g], [R], v[183], s[STARTED], a[id=Qpn3IcmGQPWMlotMpyZPYg]], indexUUID [MNw2RvCPSTeEnNNJYoUIxw], message [failed to perform indices:data/write/bulk[s] on replica on node {es-247d.foo.bar}{-wygWN2DQG6FibS7MoW25g}{10.0.69.125}{10.0.69.125:9300}{aws_availability_zone=us-east-1d, index_set=reference_partitioned, max_local_storage_nodes=1, master=false}], failure [NodeDisconnectedException[[es-247d.foo.bar][10.0.69.125:9300][indices:data/write/bulk[s][r]] disconnected]]
NodeDisconnectedException[[es-247d.foo.bar][10.0.69.125:9300][indices:data/write/bulk[s][r]] disconnected]
[2016-12-21 07:20:41,831][INFO ][cluster.routing.allocation] [es-150e.foo.bar] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[reference_2015-04-01_2][5]] ...]).
[2016-12-21 07:54:50,472][INFO ][cluster.routing.allocation] [es-150e.foo.bar] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[reference_2015-04-01_2][5]] ...]).
The strange thing is that I do not see any relevant log entries on the client (in this case e-247d). During the time the cluster was yellow that client only logged a few lines like this:
[2016-12-21 07:28:20,360][WARN ][index.fielddata ] [es-247d.foo.bar] [reference_2015-09-22_1] failed to find format [compressed] for field [attributes.document_position], will use default
This is a separate problem which we will fix.
But can somebody explain to me why the luster went yellow for a while? There were no interruptions in network traffic or spikes in CPU-usage. We have also seen the same behavior at other times then involving other nodes.