Understanding reasons for cluster going to yellow state

bsarkar · April 12, 2018, 12:39pm

I am investigating why our production clusters go from GREEN to YELLOW state. Specifically, I am confused between two kinds of logs that are logged by the master node:

[es-m01-rm] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{es-d05-rm}{LyFGgvX1SpKWaWEPi4S_aQ}{2NZ6hZh_Q6qR7ytG9AeL1w}{192.168.0.155}{192.168.0.155:9300}{faultDomain=0, updateDomain=4} failed to ping, tried [3] times, each with maximum [30s] timeout]).
[es-m11-rm] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{es-d14-rm}{Fm1TcGXaQge1Ys3ZKHckaw}{D2i_vY7zQX-C3UKYJcvzrw}{30.0.0.164}{30.0.0.164:9300}{faultDomain=0, updateDomain=0} transport disconnected]).

While I understand what the first error means, I'm not sure what to interpret of the second error. My guess was all connectivity errors should have been of the form of the first error where the data node would ping the master node 3 times and mark it out of cluster on failing to receive ping responses in all 3 times. I want to understand how is the second error different from the first?

DavidTurner · April 15, 2018, 11:47am

Neither indicates a connectivity error for certain, although connectivity is one possible cause. The first indicates that there's an open connection between two nodes but "ping" messages are not receiving responses, which might be because of packet loss or network partition or else because the node is running very slowly (e.g. is under GC pressure). The second indicates that the connection between the two nodes was actively closed, which might be because the remote node was stopped but could also be because of the action of something (e.g. a firewall) that sits between the two nodes.

system · May 13, 2018, 11:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster state often in yellow: data node and master are running but failed to ping each other Elasticsearch	3	2207	May 11, 2018
What actually causes Red / Yellow cluster health Elasticsearch	18	2628	July 6, 2017
Cluster state often yellow: data node failed to ping master Elasticsearch	4	1532	October 5, 2018
3 Node Cluster health is always YELLOW Elasticsearch	13	3197	July 5, 2017
Elasticsearch, cluster health: yellow Elasticsearch	4	745	July 5, 2017

Understanding reasons for cluster going to yellow state

Related topics