Hi, I am using elasticsearch cluster (8.7.0) on Kubernetes, I have 1 master, 1 client and 3 data nodes.
After the restart of my master node, the other nodes cannot discover the master again.
This is in the log of the data node:
{"@timestamp":"2023-05-31T12:38:36.905Z", "log.level": "WARN", "message":"master not discovered yet: have discovered [{elasticsearch-data}{XLhtbNSQQOG4F6a-luPJ7Q}{QNGqb2xxTzagpLE6BPQhYA}{elasticsearch-data}{192.168.217.75}{192.168.217.75:9300}{d}{8.7.0}, {elasticsearch-master}{9hvRUjvsTXeVF-NIEwbZQA}{2hbkEtdwTIy7DBAYI1AxOw}{elasticsearch-master}{192.168.247.28}{192.168.247.28:9300}{m}{8.7.0}]; discovery will continue using [10.102.87.156:9300] from hosts providers and [{elasticsearch-master}{Jwgz0LUATzyyQk4qvU292g}{OSmdhhayQguwVZwmj3c3fw}{elasticsearch-master}{192.168.84.131}{192.168.84.131:9300}{m}{8.7.0}] from last-known cluster state; node term 1, last-accepted version 71 in term 1; for troubleshooting guidance, see https://www.elastic.co/guide/en/elasticsearch/reference/8.7/discovery-troubleshooting.html", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-data][cluster_coordination][T#1]","log.logger":"org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper","elasticsearch.cluster.uuid":"hKpt2rS0TbG1z2PjWRtVnQ","elasticsearch.node.id":"XLhtbNSQQOG4F6a-luPJ7Q","elasticsearch.node.name":"elasticsearch-data","elasticsearch.cluster.name":"elasticsearch"}
{"@timestamp":"2023-05-31T12:38:45.163Z", "log.level": "WARN", "message":"monitoring execution failed", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-data][write][T#6]","log.logger":"org.elasticsearch.xpack.monitoring.MonitoringService","elasticsearch.cluster.uuid":"hKpt2rS0TbG1z2PjWRtVnQ","elasticsearch.node.id":"XLhtbNSQQOG4F6a-luPJ7Q","elasticsearch.node.name":"elasticsearch-data","elasticsearch.cluster.name":"elasticsearch","error.type":"org.elasticsearch.xpack.monitoring.exporter.ExportException","error.message":"failed to flush export bulks","error.stack_trace":"org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks\n\tat org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.lambda$doFlush$0(ExportBulk.java:110)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:175)\n\tat org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$doFlush$1(LocalBulk.java:114)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:175)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.support.ContextPreservingActionListener.onFailure(ContextPreservingActionListener.java:38)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onFailure(NodeClient.java:170)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.tasks.TaskManager$1.onFailure(TaskManager.java:218)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.support.ContextPreservingActionListener.onFailure(ContextPreservingActionListener.java:38)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$Delegating.onFailure(ActionListener.java:97)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$Delegating.onFailure(ActionListener.java:97)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$RunBeforeActionListener.onFailure(ActionListener.java:450)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionRunnable.onFailure(ActionRunnable.java:92)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.retry(TransportBulkAction.java:685)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.handleBlockExceptions(TransportBulkAction.java:672)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:541)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:891)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\nCaused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulk [default_local]\n\t... 20 more\nCaused by: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:177)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.handleBlockExceptions(TransportBulkAction.java:668)\n\t... 8 more\n"}
{"@timestamp":"2023-05-31T12:38:45.423Z", "log.level": "WARN", "message":"failed to connect to {elasticsearch-master}{Jwgz0LUATzyyQk4qvU292g}{OSmdhhayQguwVZwmj3c3fw}{elasticsearch-master}{192.168.84.131}{192.168.84.131:9300}{m}{8.7.0}{xpack.installed=true} (tried [67] times)", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-data][generic][T#3]","log.logger":"org.elasticsearch.cluster.NodeConnectionsService","elasticsearch.cluster.uuid":"hKpt2rS0TbG1z2PjWRtVnQ","elasticsearch.node.id":"XLhtbNSQQOG4F6a-luPJ7Q","elasticsearch.node.name":"elasticsearch-data","elasticsearch.cluster.name":"elasticsearch","error.type":"org.elasticsearch.transport.ConnectTransportException","error.message":"[elasticsearch-master][192.168.84.131:9300] connect_exception","error.stack_trace":"org.elasticsearch.transport.ConnectTransportException: [elasticsearch-master][192.168.84.131:9300] connect_exception\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1151)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:502)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListenerDirectly(ListenableFuture.java:111)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:100)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:149)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:139)\n\tat org.elasticsearch.transport.netty4@8.7.0/org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:62)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)\n\tat org.elasticsearch.security@8.7.0/org.elasticsearch.xpack.core.security.transport.netty4.SecurityNetty4Transport$ClientSslHandlerInitializer.lambda$connect$1(SecurityNetty4Transport.java:267)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)\n\tat io.netty.transport@4.1.86.Final/io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:262)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.transport@4.1.86.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)\n\tat io.netty.common@4.1.86.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.common@4.1.86.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\nCaused by: org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: Failed execution\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:80)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:72)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$notifyListenerDirectly$0(ListenableFuture.java:111)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:499)\n\t... 30 more\nCaused by: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: 192.168.84.131/192.168.84.131:9300\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:257)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:231)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:53)\n\tat org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:65)\n\t... 32 more\nCaused by: io.netty.channel.ConnectTimeoutException: connection timed out: 192.168.84.131/192.168.84.131:9300\n\tat io.netty.transport@4.1.86.Final/io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261)\n\t... 9 more\n"}
This is in the log of the master node:
{"@timestamp":"2023-05-31T12:52:25.306Z", "log.level": "WARN", "message":"address [10.102.87.156:9300], node [null], requesting [false] discovery result: [elasticsearch-master][192.168.247.28:9300] successfully discovered local node {elasticsearch-master}{9hvRUjvsTXeVF-NIEwbZQA}{2hbkEtdwTIy7DBAYI1AxOw}{elasticsearch-master}{192.168.247.28}{192.168.247.28:9300}{m}{8.7.0} at [10.102.87.156:9300]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-master][generic][T#2]","log.logger":"org.elasticsearch.discovery.PeerFinder","elasticsearch.node.name":"elasticsearch-master","elasticsearch.cluster.name":"elasticsearch"}
This is in the log of the client node:
{"@timestamp":"2023-05-31T12:53:07.147Z", "log.level": "WARN", "message":"master not discovered yet: have discovered [{elasticsearch-client}{PSXY08nNT1S8KWVH6qKBaA}{OAPLtfaUQXmTN0TriZl-7A}{elasticsearch-client}{192.168.84.134}{192.168.84.134:9300}{8.7.0}, {elasticsearch-master}{9hvRUjvsTXeVF-NIEwbZQA}{2hbkEtdwTIy7DBAYI1AxOw}{elasticsearch-master}{192.168.247.28}{192.168.247.28:9300}{m}{8.7.0}]; discovery will continue using [10.102.87.156:9300] from hosts providers and [{elasticsearch-master}{Jwgz0LUATzyyQk4qvU292g}{OSmdhhayQguwVZwmj3c3fw}{elasticsearch-master}{192.168.84.131}{192.168.84.131:9300}{m}{8.7.0}] from last-known cluster state; node term 1, last-accepted version 71 in term 1; for troubleshooting guidance, see https://www.elastic.co/guide/en/elasticsearch/reference/8.7/discovery-troubleshooting.html", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-client][cluster_coordination][T#1]","log.logger":"org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper","elasticsearch.cluster.uuid":"hKpt2rS0TbG1z2PjWRtVnQ","elasticsearch.node.id":"PSXY08nNT1S8KWVH6qKBaA","elasticsearch.node.name":"elasticsearch-client","elasticsearch.cluster.name":"elasticsearch"}
Can you please help me I am trying to solve this for 3 days, what can be the problem, why is this happening and how can I solve this?