I am investigating why our production clusters go from GREEN to YELLOW state. Specifically, I am confused between two kinds of logs that are logged by the master node:
[es-m01-rm] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{es-d05-rm}{LyFGgvX1SpKWaWEPi4S_aQ}{2NZ6hZh_Q6qR7ytG9AeL1w}{192.168.0.155}{192.168.0.155:9300}{faultDomain=0, updateDomain=4} failed to ping, tried [3] times, each with maximum [30s] timeout]).
[es-m11-rm] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{es-d14-rm}{Fm1TcGXaQge1Ys3ZKHckaw}{D2i_vY7zQX-C3UKYJcvzrw}{30.0.0.164}{30.0.0.164:9300}{faultDomain=0, updateDomain=0} transport disconnected]).
While I understand what the first error means, I'm not sure what to interpret of the second error. My guess was all connectivity errors should have been of the form of the first error where the data node would ping the master node 3 times and mark it out of cluster on failing to receive ping responses in all 3 times. I want to understand how is the second error different from the first?