Elasticsearch master pods failing master not discovered or elected yet, an election requires at least 2 nodes

Hello, I'm facing an issue with my elasticsearch cluster. I have 15 pods running in multiple nodes on my kubernetes cluster. 3 elasticsearch master pod, 12 elasticsearch-data pods are running .
We are facing issue with Master pods failing as only 1 master pod is running and it is not identifying my remaining 2 pods and cluster is in async state.
Please find below logs

[2024-08-27T05:52:13,612][WARN ][o.e.c.c.ClusterFormationFailureHelper] [twamp-es-master-1] master not discovered yet, this node has not previously joined a bootstrapped cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{twamp-es-master-1}{iDt5Fo70ShKuNgEaopAtjQ}{TYFK2HM1TWW0i-0cQhCPIQ}{twamp-es-master-1}{192.168.14.47}{192.168.14.47:9300}{mr}{8.9.1}, {twamp-es-master-0}{k4_DXeaGQY6cjpOyMcqkgg}{ZBOZCjtMTTGKlr8SVXfeiQ}{twamp-es-master-0}{192.168.7.158}{192.168.7.158:9300}{mr}{8.9.1}, {twamp-es-master-2}{kBAN4EF7TzaUQmSsODP8aA}{jV7DVxERQLqba5TsMvsRfQ}{twamp-es-master-2}{192.168.18.23}{192.168.18.23:9300}{mr}{8.9.1}]; discovery will continue using [192.168.18.23:9300, 192.168.7.158:9300] from hosts providers and [{twamp-es-master-1}{iDt5Fo70ShKuNgEaopAtjQ}{TYFK2HM1TWW0i-0cQhCPIQ}{twamp-es-master-1}{192.168.14.47}{192.168.14.47:9300}{mr}{8.9.1}] from last-known cluster state; node term 0, last-accepted version 0 in term 0; for troubleshooting guidance, see https://www.elastic.co/guide/en/elasticsearch/reference/8.9/discovery-troubleshooting.html
at org.elasticsearch.server@8.9.1/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:316)
        at org.elasticsearch.server@8.9.1/org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:355)
        at org.elasticsearch.server@8.9.1/org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:293)
        at org.elasticsearch.server@8.9.1/org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:642)
        at org.elasticsearch.server@8.9.1/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1623) 

Do all the master nodes have persistent storage? How are they configured?

Yes all the master node has persistent storage . we actually wants to expand this nodes capacity hence we deleted the pvc and it restarted with updated expanded capacity. But again after that master pods giving us error,
"master not discovered yet, this node has not previously joined a bootstrapped cluster, and [cluster.initial_master_nodes] is empty on this node:"

It looks to me like the cluster state that was saved on disk which contained the information about other nodes have been lost and you have apparently removed the initial master nodes config from the config (as recommended). I suspect you need to set the cluster up again as a new cluster, which will cause data los unless you have snapshot available.

Any solution we have to recover data if we have shards available in pv pvc , we dont have snapshots available

I believe the elasticsearch-node utility was designed for this type of unsafe recovery, but I have no idea how you would be able to use this with a cluster on Kubernetes. As I believe you have lost all master nodes it is likely the data is lost and can not be recovered.

I do not know much about kubernetes, but isn't the pvc the persistent storage? If you deleted it, you deleted the data for the node.

The error is consistent with elasticsearch starting with an empty data dir.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.