Hello Everyone;
I created elastic environment with 4 nodes. 2 nodes set master role other 2 role set data node. When all node was starting everything is okey. but when any master node down, my cluster not working. Other msater node not elected as master.
My cluster configuration
VPSYSMNGELK01(10.30.40.30)
path.data: /usr/share/Elasticsearch/data
node.roles: [ master]
network.host: 0.0.0.0
#transport.host: 0.0.0.0
bootstrap.memory_lock: true
cluster.name: MyCluster
node.name: "VPSYSMNGELK01.test.com"
discovery.seed_hosts: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com", "VPSYSMNGELK03.test.com", "VPSYSMNGELK04.test.com"]
cluster.initial_master_nodes: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com"]
xpack.security.enabled: false
VPSYSMNGELK02(10.30.40.31)
path.data: /usr/share/Elasticsearch/data
node.roles: [ master]
network.host: 0.0.0.0
#transport.host: 0.0.0.0
bootstrap.memory_lock: true
cluster.name: MyCluster
node.name: "VPSYSMNGELK02.test.com"
discovery.seed_hosts: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com", "VPSYSMNGELK03.test.com", "VPSYSMNGELK04.test.com"]
cluster.initial_master_nodes: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com"]
xpack.security.enabled: false
VPSYSMNGELK03(10.30.40.32)
path.data: /usr/share/Elasticsearch/data
node.roles: [ data ]
network.host: 0.0.0.0
#transport.host: 0.0.0.0
bootstrap.memory_lock: true
cluster.name: MyCluster
node.name: "VPSYSMNGELK03.test.com"
discovery.seed_hosts: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com", "VPSYSMNGELK03.test.com", "VPSYSMNGELK04.test.com"]
cluster.initial_master_nodes: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com"]
xpack.security.enabled: false
VPSYSMNGELK04(10.30.40.33)
path.data: /usr/share/Elasticsearch/data
node.roles: [ data ]
network.host: 0.0.0.0
#transport.host: 0.0.0.0
bootstrap.memory_lock: true
cluster.name: MyCluster
node.name: "VPSYSMNGELK04.test.com"
discovery.seed_hosts: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com", "VPSYSMNGELK03.test.com", "VPSYSMNGELK04.test.com"]
cluster.initial_master_nodes: ["VPSYSMNGELK01.test.com", "VPSYSMNGELK02.test.com"]
xpack.security.enabled: false
When every node start;
When stop VPSYSMNGELK01 master node cluster get faild;
[2022-03-20T15:03:07,093][INFO ][o.e.c.c.Coordinator ] [VPSYSMNGELK02.test.com] master node [{VPSYSMNGELK01.test.com}{RA4OtAorQpiwcs5WCI6r1Q}{GI_aBd_YSi-Ere_kvLZ_yw}{10.30.40.30}{10.30.40.30:9300}{m}] disconnected, restarting discovery
[2022-03-20T15:03:07,097][INFO ][o.e.c.s.ClusterApplierService] [VPSYSMNGELK02.test.com] master node changed {previous [{VPSYSMNGELK01.test.com}{RA4OtAorQpiwcs5WCI6r1Q}{GI_aBd_YSi-Ere_kvLZ_yw}{10.30.40.30}{10.30.40.30:9300}{m}], current []}, term: 12, version: 343, reason: becoming candidate: onLeaderFailure
[2022-03-20T15:03:07,104][WARN ][o.e.c.NodeConnectionsService] [VPSYSMNGELK02.test.com] failed to connect to {VPSYSMNGELK01.test.com}{RA4OtAorQpiwcs5WCI6r1Q}{GI_aBd_YSi-Ere_kvLZ_yw}{10.30.40.30}{10.30.40.30:9300}{m}{xpack.installed=true} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [VPSYSMNGELK01.test.com][10.30.40.30:9300] connect_exception
at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1107) ~[elasticsearch-8.1.0.jar:8.1.0]
at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$0(ActionListener.java:279) ~[elasticsearch-8.1.0.jar:8.1.0]
at org.elasticsearch.core.CompletableContext.lambda$addListener$0(CompletableContext.java:31) ~[elasticsearch-core-8.1.0.jar:8.1.0]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
at org.elasticsearch.core.CompletableContext.completeExceptionally(CompletableContext.java:46) ~[elasticsearch-core-8.1.0.jar:8.1.0]
at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:63) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.30.40.30/10.30.40.30:9300
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
... 7 more
[2022-03-20T15:03:17,103][WARN ][o.e.c.c.ClusterFormationFailureHelper] [VPSYSMNGELK02.test.com] master not discovered or elected yet, an election requires a node with id [RA4OtAorQpiwcs5WCI6r1Q], have only discovered non-quorum [{VPSYSMNGELK02.test.com}{DJf3OpY6RaepLMJHKpSwSA}{JEOLyTh5QNWaSmaV8Phn7w}{10.30.40.31}{10.30.40.31:9300}{m}]; discovery will continue using [10.30.40.30:9300, 10.30.40.32:9300, 10.30.40.33:9300] from hosts providers and [{VPSYSMNGELK01.test.com}{RA4OtAorQpiwcs5WCI6r1Q}{GI_aBd_YSi-Ere_kvLZ_yw}{10.30.40.30}{10.30.40.30:9300}{m}, {VPSYSMNGELK02.test.com}{DJf3OpY6RaepLMJHKpSwSA}{JEOLyTh5QNWaSmaV8Phn7w}{10.30.40.31}{10.30.40.31:9300}{m}] from last-known cluster state; node term 12, last-accepted version 343 in term 12
[2022-03-20T15:03:27,106][WARN ][o.e.c.c.ClusterFormationFailureHelper] [VPSYSMNGELK02.test.com] master not discovered or elected yet, an election requires a node with id [RA4OtAorQpiwcs5WCI6r1Q], have only discovered non-quorum [{VPSYSMNGELK02.test.com}{DJf3OpY6RaepLMJHKpSwSA}{JEOLyTh5QNWaSmaV8Phn7w}{10.30.40.31}{10.30.40.31:9300}{m}]; discovery will continue using [10.30.40.30:9300, 10.30.40.32:9300, 10.30.40.33:9300] from hosts providers and [{VPSYSMNGELK01.test.com}{RA4OtAorQpiwcs5WCI6r1Q}{GI_aBd_YSi-Ere_kvLZ_yw}{10.30.40.30}{10.30.40.30:9300}{m}, {VPSYSMNGELK02.test.com}{DJf3OpY6RaepLMJHKpSwSA}{JEOLyTh5QNWaSmaV8Phn7w}{10.30.40.31}{10.30.40.31:9300}{m}] from last-known cluster state; node term 12, last-accepted version 343 in term 12
[2022-03-20T15:03:37,113][WARN ][o.e.c.c.ClusterFormationFailureHelper] [VPSYSMNGELK02.test.com] master not discovered or elected yet, an election requires a node with id [RA4OtAorQpiwcs5WCI6r1Q], have only discovered non-quorum [{VPSYSMNGELK02.test.com}{DJf3OpY6RaepLMJHKpSwSA}{JEOLyTh5QNWaSmaV8Phn7w}{10.30.40.31}{10.30.40.31:9300}{m}]; discovery will continue using [10.30.40.30:9300, 10.30.40.32:9300, 10.30.40.33:9300] from hosts providers and [{VPSYSMNGELK01.test.com}{RA4OtAorQpiwcs5WCI6r1Q}{GI_aBd_YSi-Ere_kvLZ_yw}{10.30.40.30}{10.30.40.30:9300}{m}, {VPSYSMNGELK02.test.com}{DJf3OpY6RaepLMJHKpSwSA}{JEOLyTh5QNWaSmaV8Phn7w}{10.30.40.31}{10.30.40.31:9300}{m}] from last-known cluster state; node term 12, last-accepted version 343 in term 12
root@VPSYSMNGELK02:/home/arif#
Could you help me to solve the problem?