Hello,
I'm using the ECK stack to run Elasticsearch. Right now it looks like there is no known master node and the election attempts are failing. How do I fix this? Some logs are included below:
{"type": "server", "timestamp": "2022-08-26T22:53:57,606Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "path: /doc/_doc/0NilpipTWExu9NZSnfm84h, params: {index=doc, id=0NilpipTWExu9NZSnfm84h}",
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized, SERVICE_UNAVAILABLE/2/no master];",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:189) ~[elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.handleBlockExceptions(TransportBulkAction.java:535) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:415) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$2.onTimeout(TransportBulkAction.java:569) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:598) [elasticsearch-7.6.1.jar:7.6.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.6.1.jar:7.6.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]",
"Suppressed: org.elasticsearch.discovery.MasterNotDiscoveredException",
"\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:220) ~[elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:598) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.6.1.jar:7.6.1]",
"\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"\tat java.lang.Thread.run(Thread.java:830) [?:?]"] }
{"type": "server", "timestamp": "2022-08-26T22:53:59,390Z", "level": "DEBUG", "component": "o.e.a.s.m.TransportMasterNodeAction", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "no known master node, scheduling a retry" }
{"type": "server", "timestamp": "2022-08-26T22:54:05,076Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [4pSTqftATSOOKpem68qgxw, VTPIvaO-S6K9QhSVCQyvyw, UR0zf72tRQGizayjKB98aw], have discovered [{elasticsearch-es-default-0}{VTPIvaO-S6K9QhSVCQyvyw}{uyaTQXmKRSq8mjCbZF1oiQ}{10.28.2.8}{10.28.2.8:9300}{dilm}{ml.machine_memory=6442450944, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.28.0.8:9300, 10.28.1.8:9300, 10.28.2.9:9300] from hosts providers and [{elasticsearch-es-default-0}{VTPIvaO-S6K9QhSVCQyvyw}{uyaTQXmKRSq8mjCbZF1oiQ}{10.28.2.8}{10.28.2.8:9300}{dilm}{ml.machine_memory=6442450944, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 27, last-accepted version 13391 in term 27" }
I see this when viewing /_cat/health:
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}