Elasticsearch nodes did not elect master even after failure to discover master


(Vachan D A) #1

We recently faced an AWS network outage on one of the 3 nodes in our ES cluster. The node ES2 announced itself as the master and added the two nodes ES1 and ES3 respectively, to the cluster. The other nodes though, failed to
register to the master owing to network issue and neither did they elect a new master nor declare themselves as master.

We use elasticsearch version 1.4.4, on Ubuntu 14.04LTS hosted on AWS m4 Large instance.

Elasticsearch configuration is :

Cluster.name: example
bootstrap.mlockall: true
discovery.zen.minimum_master_nodes: 1
discovery.type: ec2
discovery.zen.ping.multicast.enabled: false
discovery.ec2.groups: es

This is the log entry where in the master node (ES2) detect and adds ES3:

[2015-10-21 15:59:58,612][INFO ][cluster.service          ] [ES2.localdomain] added {[ES3.localdomain][Vbbga0gMTVS43SIu_QLpUw][ES3.localdomain      ][inet[/10.0.0.103:9300]]{aws_availability_zone=ap-southeast-1a, max_local_storage_nodes=1},}, reason: zen-disco-receive(join from node[[ES1.localdomain][X9u5UwmYSg      -pGB7PdGNFdA][ES1.localdomain][inet[/10.0.0.219:9300]]{aws_availability_zone=ap-southeast-1a, max_local_storage_nodes=1}])

This is the log entry in the ES3, which keeps retrying for master over and over again:

> [2015-10-21 16:00:06,181][DEBUG][action.admin.cluster.health] [ES3.localdomain] no known master node, scheduling a retry
> [2015-10-21 16:00:36,182][DEBUG][action.admin.cluster.health] [ES3.localdomain] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
> [2015-10-21 16:01:06,190][DEBUG][action.admin.cluster.health] [ES3.localdomain] no known master node, scheduling a retry
> [2015-10-21 16:01:36,191][DEBUG][action.admin.cluster.health] [ES3.localdomain] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]

Any idea why master re-election didn’t happen? And why ES3 was trying to detect master instead of claiming itself as new master?


(system) #2