I have an ES 6.2.4 cluster. The cluster runs inside Kubernetes and contains 3 master and 3 data nodes with zen discovery. The cluster is idle. I am testing some failure scenarios and trying to get a handle on some odd behavior.
The minimum master nodes is set to 2.
The cluster operates normally, and deals with node failure without fanfare, except if the master leader node is stopped. Then all hell breaks loose until the node comes back.
Here, es-master-2
, the previous leader node is shutdown normally:
[2018-06-16T15:52:28,847][INFO ][o.e.n.Node ] [es-master-2] stopping ...
[2018-06-16T15:52:28,941][INFO ][o.e.x.w.WatcherService ] [es-master-2] stopping watch service, reason [shutdown initiated]
[2018-06-16T15:52:29,207][INFO ][o.e.n.Node ] [es-master-2] stopped
[2018-06-16T15:52:29,208][INFO ][o.e.n.Node ] [es-master-2] closing ...
[2018-06-16T15:52:29,221][INFO ][o.e.n.Node ] [es-master-2] closed
and a few seconds later, es-master-0
seems to think es-master-1
is now the leader:
[2018-06-16T15:52:32,004][INFO ][o.e.c.s.ClusterApplierService] [es-master-0] detected_master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300}, reason: apply cluster state (from master [master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300} committed version [66]])
and es-master-1
seems to agree:
[2018-06-16T15:53:02,001][INFO ][o.e.c.s.ClusterApplierService] [es-master-1] new_master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300}, reason: apply cluster state (from master [master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300} committed version [66] source [zen-disco-elected-as-master ([1] nodes joined)[{es-master-0}{kJBd1zqaQUGvUgOKEpK3BA}{kvzS_rXuSLip-iC-WMnJlQ}{10.2.5.134}{10.2.5.134:9300}]]])
However, a few seconds after that, both remaining master nodes start throwing Exceptions relating to being unable to connect to the missing es-master-2
node:
[2018-06-16T15:53:02,011][WARN ][o.e.c.NodeConnectionsService] [es-master-0] failed to connect to node {es-master-2}{bZzPyOqeT1SmVMYkRzoEng}{95EaEY6nR6CWg6nMZ4TqvA}{10.2.4.112}{10.2.4.112:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [es-master-2][10.2.4.112:9300] connect_timeout[30s]
and
[2018-06-16T15:53:05,570][WARN ][o.e.c.NodeConnectionsService] [es-master-1] failed to connect to node {es-master-2}{bZzPyOqeT1SmVMYkRzoEng}{95EaEY6nR6CWg6nMZ4TqvA}{10.2.4.112}{10.2.4.112:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [es-master-2][10.2.4.112:9300] connect_timeout[30s]
(which is probably ok, as they are just trying to reconnect to a node that is still down...)
but then what is not ok is that they start getting cluster state timeout failures as well e.g. on es-master-0
:
[2018-06-16T15:53:02,104][WARN ][o.e.c.s.ClusterApplierService] [es-master-0] cluster state applier task [apply cluster state (from master [master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300} committed version [66]])] took [30s] above the warn threshold of 30s
[2018-06-16T15:53:31,949][INFO ][o.e.d.z.ZenDiscovery ] [es-master-0] failed to send join request to master [{es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-06-16T15:53:35,761][INFO ][o.e.c.s.ClusterApplierService] [es-master-0] removed {{es-master-2}{bZzPyOqeT1SmVMYkRzoEng}{95EaEY6nR6CWg6nMZ4TqvA}{10.2.4.112}{10.2.4.112:9300},{es-data-2}{EI8p9YoYSvm1kK5rueMY1A}{fUqNGOBkQLC5HAk4vYix5g}{10.2.4.44}{10.2.4.44:9300},{es-data-1}{W0j6dEurSN6uh7eKog-DtA}{UmdxdduOTAiybh-Uf1ih7Q}{10.2.5.85}{10.2.5.85:9300},{es-data-0}{O47gtbITQBmKnQ1UBw_IuA}{nwUyzpouR0KoozNiRoIbXg}{10.2.4.101}{10.2.4.101:9300},}, reason: apply cluster state (from master [master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300} committed version [67]])
as well as failures on es-master-1
:
[2018-06-16T15:53:01,999][WARN ][o.e.d.z.PublishClusterStateAction] [es-master-1] timed out waiting for all nodes to process published state [66] (timeout [30s], pending nodes: [{es-data-1}{W0j6dEurSN6uh7eKog-DtA}{UmdxdduOTAiybh-Uf1ih7Q}{10.2.5.85}{10.2.5.85:9300}, {es-data-2}{EI8p9YoYSvm1kK5rueMY1A}{fUqNGOBkQLC5HAk4vYix5g}{10.2.4.44}{10.2.4.44:9300}, {es-data-0}{O47gtbITQBmKnQ1UBw_IuA}{nwUyzpouR0KoozNiRoIbXg}{10.2.4.101}{10.2.4.101:9300}, {es-master-0}{kJBd1zqaQUGvUgOKEpK3BA}{kvzS_rXuSLip-iC-WMnJlQ}{10.2.5.134}{10.2.5.134:9300}])
[2018-06-16T15:53:02,001][INFO ][o.e.c.s.ClusterApplierService] [es-master-1] new_master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300}, reason: apply cluster state (from master [master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300} committed version [66] source [zen-disco-elected-as-master ([1] nodes joined)[{es-master-0}{kJBd1zqaQUGvUgOKEpK3BA}{kvzS_rXuSLip-iC-WMnJlQ}{10.2.5.134}{10.2.5.134:9300}]]])
[2018-06-16T15:53:35,684][WARN ][o.e.c.s.ClusterApplierService] [es-master-1] cluster state applier task [apply cluster state (from master [master {es-master-1}{MJeNWUFgT7yNk-_8w1LZyQ}{uxnRZt7ARYGm7ASYNMMDwg}{10.2.4.223}{10.2.4.223:9300} committed version [66] source [zen-disco-elected-as-master ([1] nodes joined)[{es-master-0}{kJBd1zqaQUGvUgOKEpK3BA}{kvzS_rXuSLip-iC-WMnJlQ}{10.2.5.134}{10.2.5.134:9300}]]])] took [33.6s] above the warn threshold of 30s
[2018-06-16T15:53:35,690][WARN ][o.e.c.s.MasterService ] [es-master-1] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[{es-master-0}{kJBd1zqaQUGvUgOKEpK3BA}{kvzS_rXuSLip-iC-WMnJlQ}{10.2.5.134}{10.2.5.134:9300}]] took [1m] above the warn threshold of 30s
and now the entire cluster becomes unstable -- data nodes start dropping out also. The cluster remains red/unstable until the es-master-2
node is back up and running, at which point things restabilize.
The full logs from the three masters are here: https://drive.google.com/open?id=16R3eSgfuIDUqLqqkzuM_jHLxIiwTH4Zc
The discovery service currently points (discovery.zen.ping.unicast.hosts
) to a DNS entry which contains each of the masters, but I've also tried this with each master node as a comma-delimited list, and the behavior is effectively the same:
# nslookup elasticsearch-discovery
[...]
Name: elasticsearch-discovery.default.svc.cluster.local
Address: 10.2.4.223
Address: 10.2.4.112
Address: 10.2.5.134