Resolving master node election issues

Hello,

I'm using the ECK stack to run Elasticsearch. Right now it looks like there is no known master node and the election attempts are failing. How do I fix this? Some logs are included below:

{"type": "server", "timestamp": "2022-08-26T22:53:57,606Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "path: /doc/_doc/0NilpipTWExu9NZSnfm84h, params: {index=doc, id=0NilpipTWExu9NZSnfm84h}", 
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized, SERVICE_UNAVAILABLE/2/no master];",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:189) ~[elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.handleBlockExceptions(TransportBulkAction.java:535) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:415) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$2.onTimeout(TransportBulkAction.java:569) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:598) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.6.1.jar:7.6.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]",
"Suppressed: org.elasticsearch.discovery.MasterNotDiscoveredException",
"\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:220) ~[elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:598) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"\tat java.lang.Thread.run(Thread.java:830) [?:?]"] }
{"type": "server", "timestamp": "2022-08-26T22:53:59,390Z", "level": "DEBUG", "component": "o.e.a.s.m.TransportMasterNodeAction", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "no known master node, scheduling a retry" }
{"type": "server", "timestamp": "2022-08-26T22:54:05,076Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [4pSTqftATSOOKpem68qgxw, VTPIvaO-S6K9QhSVCQyvyw, UR0zf72tRQGizayjKB98aw], have discovered [{elasticsearch-es-default-0}{VTPIvaO-S6K9QhSVCQyvyw}{uyaTQXmKRSq8mjCbZF1oiQ}{10.28.2.8}{10.28.2.8:9300}{dilm}{ml.machine_memory=6442450944, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.28.0.8:9300, 10.28.1.8:9300, 10.28.2.9:9300] from hosts providers and [{elasticsearch-es-default-0}{VTPIvaO-S6K9QhSVCQyvyw}{uyaTQXmKRSq8mjCbZF1oiQ}{10.28.2.8}{10.28.2.8:9300}{dilm}{ml.machine_memory=6442450944, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 27, last-accepted version 13391 in term 27" }

I see this when viewing /_cat/health:

{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

How many nodes?
What is your config?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.