Getting "master not discovered or elected yet" causing cluster not up in version 7.1.0

Hi, I am building a 6 nodes cluster, node 1-3 master nodes, and node 4-6 data nodes. I have the following in elasticsearch.yml on each node:

bootstrap.memory_lock: false
cluster.initial_master_nodes:

node.name: awselsdevlap01.est1933.com-esnode01

#################################### Paths ####################################

Path to directory containing configuration (this file and logging.yml):

path.data: /es_data/data01/awselsdevlap01.est1933.com-esnode01,/es_data/data02/awselsdevlap01.est1933.com-esnode01,/es_data/data03/awselsdevlap01.est1933.com-esnode01,/es_data/data04/awselsdevlap01.est1933.com-esnode01,/es_data/data05/awselsdevlap01.est1933.com-esnode01

path.logs: /es_data/es_logs/awselsdevlap01.est1933.com-esnode01

The following is what the logs look like:

2019-05-23T00:00:35,579][WARN ][o.e.c.c.ClusterFormationFailureHelper] [awselsdevlap01.est1933.com-esnode01] master not discovered or elected yet, an election requires at least 2 nodes with ids from [UWgnBPsHQ1aW4xXEZVKyJQ, T_8EuKpaTqWg2oP3TAnAaA, YZ6m2ioDQWqi1cNnOteB6w], have discovered [{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{-D-VHjdeSUyJdlTauLVuQw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{-CoUrjn9QlKE-K5SqZ-JYw}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}] which is a quorum; discovery will continue using [10.173.148.65:9300, 10.173.148.73:9300, 10.173.148.58:9300, 10.173.148.50:9300, 10.173.148.67:9300] from hosts providers and [{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{epXUK1dTSKCf0Ca9CphE3A}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 22, last-accepted version 2867 in term 21

Any idea what wrong with my config? I checked other similar postings and ensured the list in cluster.initial_master_nodes match the node.name setting.

1 Like

Hmm, this means the nodes have all found each other but for some reason cannot form a cluster. Are there any other log messages? Note that there's an open issue to do with insufficient logging if your security configuration is broken - see #42153. Can you try setting:

logger.org.elasticsearch.discovery: TRACE

and see if you get any more useful messages?

I turned on the trace, I see bunch errors about connection refused. Not sure why? Even to itself.

[2019-05-23T16:45:29,789][DEBUG][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.58:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [10.173.148.58:9300] connect_exception
at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1299) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:99) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.1.0.jar:7.1.0]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.1.0.jar:7.1.0]
at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$new$1(Netty4TcpChannel.java:72) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: awselsdevlap04.est1933.com/10.173.148.58:9300
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more

full log is here. https://pastebin.com/kY7FghPU

discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts: awselsdevlap01.est1933.com, awselsdevlap02.est1933.com, awselsdevlap03.est1933.com, awselsdevlap04.est1933.com, awselsdevlap05.est1933.com, awselsdevlap06.est1933.com

In 7.1.0 you don't need discovery.zen.minimum_master_nodes and discovery.zen.ping.unicast.hosts is now called discovery.seed_hosts.

Also discovery.seed_hosts should only really contain the master-eligible nodes and I think that is the source of most of the Connection refused messages.

You also don't need cluster.initial_master_nodes any more since this cluster already started up at least once.

However none of that actually answers the question of why this cluster isn't forming.

Can you try fixing these things above and then setting the following?

logger.org.elasticsearch.cluster.coordination: TRACE

I updated the yml file as below,

bootstrap.memory_lock: false
discovery.seed_hosts: awselsdevlap01.est1933.com, awselsdevlap02.est1933.com,
awselsdevlap03.est1933.com
http.port: 9200
network.host: site
transport.tcp.port: 9300
xpack.security.authc.realms.ldap.ldap1.bind_dn: uid=s-elasticsearch,ou=people,o=ejgallo.com
xpack.security.authc.realms.ldap.ldap1.group_search.base_dn: ou=groups,o=ejgallo.com
xpack.security.authc.realms.ldap.ldap1.order: 1
xpack.security.authc.realms.ldap.ldap1.url: ldaps://gdsprd01.ejgallo.com:636
xpack.security.authc.realms.ldap.ldap1.user_search.base_dn: ou=people,o=ejgallo.com
xpack.security.enabled: true

node.name: awselsdevlap01.est1933.com-esnode01

#################################### Paths ####################################

Path to directory containing configuration (this file and logging.yml):

path.data: /es_data/data01/awselsdevlap01.est1933.com-esnode01,/es_data/data02/awselsdevlap01.est1933.com-esnode01,/es_data/data03/awselsdevlap01.est1933.com-esnode01,/es_data/data04/awselsdevlap01.est1933.com-esnode01,/es_data/data05/awselsdevlap01.est1933.com-esnode01

path.logs: /es_data/es_logs/awselsdevlap01.est1933.com-esnode01

action.auto_create_index: true

logger.org.elasticsearch.discovery: TRACE
logger.org.elasticsearch.cluster.coordination: TRACE

Now I am getting error in the logs, errors are different now, but still no quorum is formed.

https://pastebin.com/bXvj9D1v

Thanks, this seems to be indicating that awselsdevlap03.est1933.com-esnode03 should be winning the election. Could you enable the same logging on that node and share the logs from it too?

Did that. here is the log from 03.

https://pastebin.com/XNE4MUsV

Ok that explains it:

[2019-05-23T20:10:42,549][WARN ][o.e.c.c.ClusterFormationFailureHelper] [awselsdevlap03.est1933.com-esnode03] master not discovered or elected yet, an election requires at least 3 nodes with ids from [UWgnBPsHQ1aW4xXEZVKyJQ, CM68Qk8DQ3KMZDDPc7wHkw, s3s5fQG6SwyWvtDPDKZ2gQ, 5emh3C9PQ_aK9Si-3uF_iQ, YZ6m2ioDQWqi1cNnOteB6w], have discovered [{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{VzR0zqVzT9-tTqCePts5GA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [10.173.148.143:9300, 10.173.148.65:9300] from hosts providers and [{awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zQJRt6NJTS6ysTIo3VIbqg}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 22, last-accepted version 120434 in term 22

Here's what's needed for a successful election:

an election requires at least 3 nodes with ids from [UWgnBPsHQ1aW4xXEZVKyJQ, CM68Qk8DQ3KMZDDPc7wHkw, s3s5fQG6SwyWvtDPDKZ2gQ, 5emh3C9PQ_aK9Si-3uF_iQ, YZ6m2ioDQWqi1cNnOteB6w]

One of those node IDs is awselsdevlap03:

{awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}

Another is awselsdevlap01:

{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}

But awselsdevlap02 isn't in that list:

{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}

I think this means that you have removed half or more of the master-eligible nodes in the cluster, so none of the remaining nodes has an up-to-date cluster state. You need to reinstate at least one of the missing nodes to proceed.

Can you tell us a bit more about what you've been doing to this cluster? Were you running it with a different configuration in the past? Were you using any unusual settings?

I did the initial installs using the Ansible script from https://github.com/elastic/ansible-elasticsearch, which by default the security is not enabled.

I was able to get the cluster running after that, then I need to get LDAP integration to work, which needs to enable the security. So I added the lines for the LDAP realm, which was working in version 6.7.0. But once that is added, I never was able to bring up cluster again.

So at this point, what should I do to get cluster going without pulling all my hairs out? :frowning:

As I said, you need to reinstate at least one of the missing nodes to proceed.

I made the same change in yml file on the node 02, then logs from 01, 02 and 03 are all showing such as below:

[2019-05-23T21:43:08,343][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap01.est1933.com] to [10.173.148.143:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap02.est1933.com] to [10.173.148.65:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap03.est1933.com] to [10.173.148.73:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] probing resolved transport addresses [10.173.148.143:9300, 10.173.148.73:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zA7D9P01SaalWMy7pn5YdA}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-J7niYWCS--b6Va3HjvSCQ}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{WYtQnncWTj26CKXTjDeDYA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-23T21:43:08,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:08,537][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:08,705][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.143:9300, discoveryNode={awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-J7niYWCS--b6Va3HjvSCQ}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zA7D9P01SaalWMy7pn5YdA}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] probing master nodes from cluster state: nodes:
{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{WYtQnncWTj26CKXTjDeDYA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:09,344][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.143:9300, discoveryNode={awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-J7niYWCS--b6Va3HjvSCQ}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zA7D9P01SaalWMy7pn5YdA}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{WYtQnncWTj26CKXTjDeDYA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-23T21:43:09,344][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:09,344][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap01.est1933.com] to [10.173.148.143:9300]
[2019-05-23T21:43:09,344][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap02.est1933.com] to [10.173.148.65:9300]

Hi, I am still not able to form a cluster, what else could not incorrect?

What actions have you taken since your last post? What are the logs saying now?

In each node, it listed the node to be resolved, and looks like it was able to find the other peer nodes, but still cluster is not able to formed.

[2019-05-28T17:07:59,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,191][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap01.est1933.com-esnode01] resolved host [awselsdevlap01.est1933.com] to [10.173.148.143:9300]
[2019-05-28T17:07:59,191][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap01.est1933.com-esnode01] resolved host [awselsdevlap02.est1933.com] to [10.173.148.65:9300]
[2019-05-28T17:07:59,191][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap01.est1933.com-esnode01] resolved host [awselsdevlap03.est1933.com] to [10.173.148.73:9300]
[2019-05-28T17:07:59,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] probing resolved transport addresses [10.173.148.65:9300, 10.173.148.73:9300]
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.65:9300, discoveryNode={awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{9j1XEYckRWaCHQDkymoHtw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-SSC-sLSShuU_Zh-6sOScA}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{NGvD66xeQyWTQ8u4jwnWnQ}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{NGvD66xeQyWTQ8u4jwnWnQ}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-SSC-sLSShuU_Zh-6sOScA}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{9j1XEYckRWaCHQDkymoHtw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,481][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,655][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:08:00,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.65:9300, discoveryNode={awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{9j1XEYckRWaCHQDkymoHtw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-28T17:08:00,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{NGvD66xeQyWTQ8u4jwnWnQ}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-28T17:08:00,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] probing master nodes from cluster state: nodes:
{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-SSC-sLSShuU_Zh-6sOScA}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}, local

It does not look like you have reinstated any of the missing nodes as instructed above:

When you say re-instate a node, what exactly that means? Excuse my lack of Elasticsearch terms. I have 6 nodes, 3 master nodes and 3 data nodes. I started all 3 master nodes. The log entries in each node are almost the same.

When this cluster was last running there were more master-eligible nodes than there are now. The last log you shared was saying this:

At the moment, only 2 of the 5 listed nodes are available, and this is preventing an election from taking place. In order to allow a master to be elected you need to put one of the missing master-eligible nodes back into service. Unfortunately we can't really tell where these nodes have gone - all we know is that they're not running and talking to the current set of nodes.

Try setting the following into elasticsearch.yml config file on all master nodes:

cluster.initial_master_nodes: ["node_name1", "node_name2", ... ]

Hi @Corneliu, this won't help. The cluster.initial_master_nodes setting is ignored once the first election has completed, which is the case in this cluster.

I am still fighting with this cluster not able to form issue, I do have a related question. Where the last cluster state is saved at? I want to wipe out the history of cluster and start over from scratch.