Getting "master not discovered or elected yet" causing cluster not up in version 7.1.0

niudaye123 · May 23, 2019, 4:22pm

Hi, I am building a 6 nodes cluster, node 1-3 master nodes, and node 4-6 data nodes. I have the following in elasticsearch.yml on each node:

bootstrap.memory_lock: false
cluster.initial_master_nodes:

awselsdevlap01.est1933.com-esnode01
awselsdevlap02.est1933.com-esnode02
awselsdevlap03.est1933.com-esnode03
cluster.name: es_gallo_dev
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts: awselsdevlap01.est1933.com, awselsdevlap02.est1933.com,
awselsdevlap03.est1933.com, awselsdevlap04.est1933.com, awselsdevlap05.est1933.com,
awselsdevlap06.est1933.com
http.port: 9200
network.host: site
transport.tcp.port: 9300
xpack.security.authc.realms.ldap.ldap1.bind_dn: uid=s-elasticsearch,ou=people,o=ejgallo.com
xpack.security.authc.realms.ldap.ldap1.group_search.base_dn: ou=groups,o=ejgallo.com
xpack.security.authc.realms.ldap.ldap1.order: 1
xpack.security.authc.realms.ldap.ldap1.url: ldaps://gdsprd01.ejgallo.com:636
xpack.security.authc.realms.ldap.ldap1.user_search.base_dn: ou=people,o=ejgallo.com
xpack.security.enabled: true

node.name: awselsdevlap01.est1933.com-esnode01

#################################### Paths ####################################

Path to directory containing configuration (this file and logging.yml):

path.data: /es_data/data01/awselsdevlap01.est1933.com-esnode01,/es_data/data02/awselsdevlap01.est1933.com-esnode01,/es_data/data03/awselsdevlap01.est1933.com-esnode01,/es_data/data04/awselsdevlap01.est1933.com-esnode01,/es_data/data05/awselsdevlap01.est1933.com-esnode01

path.logs: /es_data/es_logs/awselsdevlap01.est1933.com-esnode01

The following is what the logs look like:

2019-05-23T00:00:35,579][WARN ][o.e.c.c.ClusterFormationFailureHelper] [awselsdevlap01.est1933.com-esnode01] master not discovered or elected yet, an election requires at least 2 nodes with ids from [UWgnBPsHQ1aW4xXEZVKyJQ, T_8EuKpaTqWg2oP3TAnAaA, YZ6m2ioDQWqi1cNnOteB6w], have discovered [{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{-D-VHjdeSUyJdlTauLVuQw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{-CoUrjn9QlKE-K5SqZ-JYw}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}] which is a quorum; discovery will continue using [10.173.148.65:9300, 10.173.148.73:9300, 10.173.148.58:9300, 10.173.148.50:9300, 10.173.148.67:9300] from hosts providers and [{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{epXUK1dTSKCf0Ca9CphE3A}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 22, last-accepted version 2867 in term 21

Any idea what wrong with my config? I checked other similar postings and ensured the list in cluster.initial_master_nodes match the node.name setting.

DavidTurner · May 23, 2019, 4:28pm

Hmm, this means the nodes have all found each other but for some reason cannot form a cluster. Are there any other log messages? Note that there's an open issue to do with insufficient logging if your security configuration is broken - see #42153. Can you try setting:

logger.org.elasticsearch.discovery: TRACE

and see if you get any more useful messages?

niudaye123 · May 23, 2019, 5:09pm

I turned on the trace, I see bunch errors about connection refused. Not sure why? Even to itself.

[2019-05-23T16:45:29,789][DEBUG][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.58:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [10.173.148.58:9300] connect_exception
at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1299) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:99) ~[elasticsearch-7.1.0.jar:7.1.0]
at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.1.0.jar:7.1.0]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.1.0.jar:7.1.0]
at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$new$1(Netty4TcpChannel.java:72) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: awselsdevlap04.est1933.com/10.173.148.58:9300
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more

full log is here. https://pastebin.com/kY7FghPU

DavidTurner · May 23, 2019, 5:55pm

discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts: awselsdevlap01.est1933.com, awselsdevlap02.est1933.com, awselsdevlap03.est1933.com, awselsdevlap04.est1933.com, awselsdevlap05.est1933.com, awselsdevlap06.est1933.com

In 7.1.0 you don't need discovery.zen.minimum_master_nodes and discovery.zen.ping.unicast.hosts is now called discovery.seed_hosts.

Also discovery.seed_hosts should only really contain the master-eligible nodes and I think that is the source of most of the Connection refused messages.

You also don't need cluster.initial_master_nodes any more since this cluster already started up at least once.

However none of that actually answers the question of why this cluster isn't forming.

Can you try fixing these things above and then setting the following?

logger.org.elasticsearch.cluster.coordination: TRACE

niudaye123 · May 23, 2019, 7:02pm

I updated the yml file as below,

bootstrap.memory_lock: false
discovery.seed_hosts: awselsdevlap01.est1933.com, awselsdevlap02.est1933.com,
awselsdevlap03.est1933.com
http.port: 9200
network.host: site
transport.tcp.port: 9300
xpack.security.authc.realms.ldap.ldap1.bind_dn: uid=s-elasticsearch,ou=people,o=ejgallo.com
xpack.security.authc.realms.ldap.ldap1.group_search.base_dn: ou=groups,o=ejgallo.com
xpack.security.authc.realms.ldap.ldap1.order: 1
xpack.security.authc.realms.ldap.ldap1.url: ldaps://gdsprd01.ejgallo.com:636
xpack.security.authc.realms.ldap.ldap1.user_search.base_dn: ou=people,o=ejgallo.com
xpack.security.enabled: true

node.name: awselsdevlap01.est1933.com-esnode01

#################################### Paths ####################################

Path to directory containing configuration (this file and logging.yml):

path.data: /es_data/data01/awselsdevlap01.est1933.com-esnode01,/es_data/data02/awselsdevlap01.est1933.com-esnode01,/es_data/data03/awselsdevlap01.est1933.com-esnode01,/es_data/data04/awselsdevlap01.est1933.com-esnode01,/es_data/data05/awselsdevlap01.est1933.com-esnode01

path.logs: /es_data/es_logs/awselsdevlap01.est1933.com-esnode01

action.auto_create_index: true

logger.org.elasticsearch.discovery: TRACE
logger.org.elasticsearch.cluster.coordination: TRACE

Now I am getting error in the logs, errors are different now, but still no quorum is formed.

https://pastebin.com/bXvj9D1v

DavidTurner · May 23, 2019, 7:31pm

Thanks, this seems to be indicating that awselsdevlap03.est1933.com-esnode03 should be winning the election. Could you enable the same logging on that node and share the logs from it too?

niudaye123 · May 23, 2019, 8:17pm

Did that. here is the log from 03.

https://pastebin.com/XNE4MUsV

DavidTurner · May 23, 2019, 8:50pm

Ok that explains it:

[2019-05-23T20:10:42,549][WARN ][o.e.c.c.ClusterFormationFailureHelper] [awselsdevlap03.est1933.com-esnode03] master not discovered or elected yet, an election requires at least 3 nodes with ids from [UWgnBPsHQ1aW4xXEZVKyJQ, CM68Qk8DQ3KMZDDPc7wHkw, s3s5fQG6SwyWvtDPDKZ2gQ, 5emh3C9PQ_aK9Si-3uF_iQ, YZ6m2ioDQWqi1cNnOteB6w], have discovered [{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{VzR0zqVzT9-tTqCePts5GA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [10.173.148.143:9300, 10.173.148.65:9300] from hosts providers and [{awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zQJRt6NJTS6ysTIo3VIbqg}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 22, last-accepted version 120434 in term 22

Here's what's needed for a successful election:

an election requires at least 3 nodes with ids from [UWgnBPsHQ1aW4xXEZVKyJQ, CM68Qk8DQ3KMZDDPc7wHkw, s3s5fQG6SwyWvtDPDKZ2gQ, 5emh3C9PQ_aK9Si-3uF_iQ, YZ6m2ioDQWqi1cNnOteB6w]

One of those node IDs is awselsdevlap03:

{awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}

Another is awselsdevlap01:

{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}

But awselsdevlap02 isn't in that list:

{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}

I think this means that you have removed half or more of the master-eligible nodes in the cluster, so none of the remaining nodes has an up-to-date cluster state. You need to reinstate at least one of the missing nodes to proceed.

Can you tell us a bit more about what you've been doing to this cluster? Were you running it with a different configuration in the past? Were you using any unusual settings?

niudaye123 · May 23, 2019, 9:21pm

I did the initial installs using the Ansible script from https://github.com/elastic/ansible-elasticsearch, which by default the security is not enabled.

I was able to get the cluster running after that, then I need to get LDAP integration to work, which needs to enable the security. So I added the lines for the LDAP realm, which was working in version 6.7.0. But once that is added, I never was able to bring up cluster again.

So at this point, what should I do to get cluster going without pulling all my hairs out?

DavidTurner · May 23, 2019, 9:49pm

As I said, you need to reinstate at least one of the missing nodes to proceed.

niudaye123 · May 23, 2019, 9:59pm

I made the same change in yml file on the node 02, then logs from 01, 02 and 03 are all showing such as below:

[2019-05-23T21:43:08,343][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap01.est1933.com] to [10.173.148.143:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap02.est1933.com] to [10.173.148.65:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap03.est1933.com] to [10.173.148.73:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] probing resolved transport addresses [10.173.148.143:9300, 10.173.148.73:9300]
[2019-05-23T21:43:08,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zA7D9P01SaalWMy7pn5YdA}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-J7niYWCS--b6Va3HjvSCQ}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{WYtQnncWTj26CKXTjDeDYA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-23T21:43:08,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:08,537][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:08,705][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.143:9300, discoveryNode={awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-J7niYWCS--b6Va3HjvSCQ}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zA7D9P01SaalWMy7pn5YdA}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] probing master nodes from cluster state: nodes:
{awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{WYtQnncWTj26CKXTjDeDYA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-05-23T21:43:09,343][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:09,344][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] Peer{transportAddress=10.173.148.143:9300, discoveryNode={awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-J7niYWCS--b6Va3HjvSCQ}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{zA7D9P01SaalWMy7pn5YdA}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{WYtQnncWTj26CKXTjDeDYA}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-23T21:43:09,344][TRACE][o.e.d.PeerFinder ] [awselsdevlap02.est1933.com-esnode02] startProbe(10.173.148.65:9300) not probing local node
[2019-05-23T21:43:09,344][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap01.est1933.com] to [10.173.148.143:9300]
[2019-05-23T21:43:09,344][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap02.est1933.com-esnode02] resolved host [awselsdevlap02.est1933.com] to [10.173.148.65:9300]

niudaye123 · May 28, 2019, 4:43pm

Hi, I am still not able to form a cluster, what else could not incorrect?

DavidTurner · May 28, 2019, 4:59pm

What actions have you taken since your last post? What are the logs saying now?

niudaye123 · May 28, 2019, 5:14pm

In each node, it listed the node to be resolved, and looks like it was able to find the other peer nodes, but still cluster is not able to formed.

[2019-05-28T17:07:59,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,191][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap01.est1933.com-esnode01] resolved host [awselsdevlap01.est1933.com] to [10.173.148.143:9300]
[2019-05-28T17:07:59,191][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap01.est1933.com-esnode01] resolved host [awselsdevlap02.est1933.com] to [10.173.148.65:9300]
[2019-05-28T17:07:59,191][TRACE][o.e.d.SeedHostsResolver ] [awselsdevlap01.est1933.com-esnode01] resolved host [awselsdevlap03.est1933.com] to [10.173.148.73:9300]
[2019-05-28T17:07:59,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] probing resolved transport addresses [10.173.148.65:9300, 10.173.148.73:9300]
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.65:9300, discoveryNode={awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{9j1XEYckRWaCHQDkymoHtw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-SSC-sLSShuU_Zh-6sOScA}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{NGvD66xeQyWTQ8u4jwnWnQ}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{NGvD66xeQyWTQ8u4jwnWnQ}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional.empty, knownPeers=[{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-SSC-sLSShuU_Zh-6sOScA}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, {awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{9j1XEYckRWaCHQDkymoHtw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}], term=22}
[2019-05-28T17:07:59,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,481][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:07:59,655][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] startProbe(10.173.148.143:9300) not probing local node
[2019-05-28T17:08:00,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.65:9300, discoveryNode={awselsdevlap02.est1933.com-esnode02}{T_8EuKpaTqWg2oP3TAnAaA}{9j1XEYckRWaCHQDkymoHtw}{10.173.148.65}{10.173.148.65:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-28T17:08:00,191][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] Peer{transportAddress=10.173.148.73:9300, discoveryNode={awselsdevlap03.est1933.com-esnode03}{UWgnBPsHQ1aW4xXEZVKyJQ}{NGvD66xeQyWTQ8u4jwnWnQ}{10.173.148.73}{10.173.148.73:9300}{ml.machine_memory=31980478464, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-05-28T17:08:00,192][TRACE][o.e.d.PeerFinder ] [awselsdevlap01.est1933.com-esnode01] probing master nodes from cluster state: nodes:
{awselsdevlap01.est1933.com-esnode01}{YZ6m2ioDQWqi1cNnOteB6w}{-SSC-sLSShuU_Zh-6sOScA}{10.173.148.143}{10.173.148.143:9300}{ml.machine_memory=31980478464, xpack.installed=true, ml.max_open_jobs=20}, local

DavidTurner · May 28, 2019, 5:17pm

It does not look like you have reinstated any of the missing nodes as instructed above:

niudaye123 · May 28, 2019, 5:24pm

When you say re-instate a node, what exactly that means? Excuse my lack of Elasticsearch terms. I have 6 nodes, 3 master nodes and 3 data nodes. I started all 3 master nodes. The log entries in each node are almost the same.

DavidTurner · May 28, 2019, 5:29pm

When this cluster was last running there were more master-eligible nodes than there are now. The last log you shared was saying this:

At the moment, only 2 of the 5 listed nodes are available, and this is preventing an election from taking place. In order to allow a master to be elected you need to put one of the missing master-eligible nodes back into service. Unfortunately we can't really tell where these nodes have gone - all we know is that they're not running and talking to the current set of nodes.

Corneliu · May 29, 2019, 6:14pm

Try setting the following into elasticsearch.yml config file on all master nodes:

cluster.initial_master_nodes: ["node_name1", "node_name2", ... ]

DavidTurner · May 29, 2019, 7:15pm

Hi @Corneliu, this won't help. The cluster.initial_master_nodes setting is ignored once the first election has completed, which is the case in this cluster.

niudaye123 · May 29, 2019, 11:02pm

I am still fighting with this cluster not able to form issue, I do have a related question. Where the last cluster state is saved at? I want to wipe out the history of cluster and start over from scratch.

Topic		Replies	Views
Master not discovered or elected yet, an election requires at least 2 nodes Elasticsearch	5	4068	February 17, 2020
Getting “master not discovered or elected yet” causing cluster not up in version 7.9.1 Elasticsearch	21	4174	November 7, 2020
Shutdown master means breakdown the cluster's service? Elasticsearch	8	2332	July 6, 2017
Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node Elasticsearch	15	15024	May 24, 2019
Cluster Setup 3 Node Cluster problem Elasticsearch	48	2154	August 12, 2019

Getting "master not discovered or elected yet" causing cluster not up in version 7.1.0

Path to directory containing configuration (this file and logging.yml):

Path to directory containing configuration (this file and logging.yml):

Related topics