Hi all,
We were running an HA Elastic cluster. 3 nodes to be precise. However, this morning, without me being aware, Kubernetes was upgraded and that nodes were restarted, resulting in the cluster being in in-consistent state. This is the error we get
{"type": "server", "timestamp": "2020-06-10T13:05:29,995Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-ha-cluster", "node.name": "es-ha-cluster-es-nodes-0", "message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [pdgmDYn8SZil8PC54Niqjw, jEEaIPPERBGCoZcuBH3mWw, 2OBhIOXrRfq0pvPVwpv-TA], have discovered [{es-ha-cluster-es-nodes-0}{2OBhIOXrRfq0pvPVwpv-TA}{RebLHrjjT4KUOCLe0Eu7dw}{10.244.3.23}{10.244.3.23:9300}{dilm}{ml.machine_memory=30064771072, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.244.1.21:9300, 10.244.4.26:9300] from hosts providers and [{es-ha-cluster-es-nodes-0}{2OBhIOXrRfq0pvPVwpv-TA}{RebLHrjjT4KUOCLe0Eu7dw}{10.244.3.23}{10.244.3.23:9300}{dilm}{ml.machine_memory=30064771072, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 15, last-accepted version 12945 in term 15" }
{"type": "server", "timestamp": "2020-06-10T13:05:36,608Z", "level": "ERROR", "component": "o.e.x.s.a.e.NativeUsersStore", "cluster.name": "es-ha-cluster", "node.name": "es-ha-cluster-es-nodes-0", "message": "security index is unavailable. short circuiting retrieval of user [aks-ingest]" }
The cluster has been in this state since morning. I have been trying to remove nodes using the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/add-elasticsearch-nodes.html
But API is not working and I get the following error:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
Can anyone please help me in this matter ?
Thanks !