Cannot bootstrap a new cluster: master not discovered or elected yet

Hi,
after having some problems on another test cluster (I asked help on this post) I'm trying to setup a new cluster from scratch to understand what's happening.

I'm stuck because this it the third time I'm trying to run the new cluster but the master host cannot be discovered.

My three nodes are:

Every node is on the same network and all nodes can see each others, there are no firewall rules, Elasticsearch is running on every nodes and it's listening on their own IPs. Every node is both master and data.

This is the node configuration:

cluster.name: mycluster
node.name: elastic[1:3].domain.com
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host:
  - 127.0.0.1
  - 192.168.245.7[1:3]

network.publish_host: 192.168.245.7[1:3]

discovery.seed_hosts:
  - elastic1.domain.com
  - elastic2.domain.com
  - elastic3.domain.com

cluster.initial_master_nodes:
  - elastic1.domain.com
  - elastic2.domain.com
  - elastic3.domain.com

I started every node from scratch, with /var/lib/elasticsearch folder empty and one by one. I tried to repeat the procedure two times without any successs.

In the logs I see many errors like these:

[2019-09-17T18:19:57,497][INFO ][o.e.c.c.JoinHelper       ] [elastic3.domain.com] failed to join {elastic2.domain.com}{hwnyk1WnRYmD9yktsXg73g}{SZgLY6V2TkS_Ossu0gUxJw}{192.168.245.72}{192.168.245.72:9300}{dim}{ml.machine_memory=16822104064, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={elastic3.domain.com}{KjVRP9WgTQy-P8JV18LbDw}{bVvohjzoQ0C5LQcUKjckew}{192.168.245.73}{192.168.245.73:9300}{dim}{ml.machine_memory=16822104064, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=35, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={elastic3.domain.com}{KjVRP9WgTQy-P8JV18LbDw}{bVvohjzoQ0C5LQcUKjckew}{192.168.245.73}{192.168.245.73:9300}{dim}{ml.machine_memory=16822104064, xpack.installed=true, ml.max_open_jobs=20}, targetNode={elastic2.domain.com}{hwnyk1WnRYmD9yktsXg73g}{SZgLY6V2TkS_Ossu0gUxJw}{192.168.245.72}{192.168.245.72:9300}{dim}{ml.machine_memory=16822104064, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.NodeDisconnectedException: [elastic2.domain.com][192.168.245.72:9300][internal:cluster/coordination/join] disconnected
[2019-09-17T18:19:57,501][INFO ][o.e.c.c.Coordinator      ] [elastic3.domain.com] master node [{elastic2.domain.com}{hwnyk1WnRYmD9yktsXg73g}{SZgLY6V2TkS_Ossu0gUxJw}{192.168.245.72}{192.168.245.72:9300}{dim}{ml.machine_memory=16822104064, ml.max_open_jobs=20, xpack.installed=true}] failed, restarting discovery
org.elasticsearch.transport.ConnectTransportException: [elastic2.domain.com][192.168.245.72:9300] disconnected during check
    at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler$1.handleException(LeaderChecker.java:268) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:544) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.handleWakeUp(LeaderChecker.java:237) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.cluster.coordination.LeaderChecker.updateLeader(LeaderChecker.java:150) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:620) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:243) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.cluster.coordination.FollowersChecker$2.doRun(FollowersChecker.java:187) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.3.2.jar:7.3.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic2.domain.com][192.168.245.72:9300] Node not connected
    at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:151) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:568) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:540) ~[elasticsearch-7.3.2.jar:7.3.2]
    ... 10 more

Could you help me please?

I am really hanged on this problem.
Thanks!

Is the [1:3] really in your elasticsearch.yml or is that just doc for the post?

I don't think you can use multiple addresses in network.host nor that[1:3] notation.

Try network.host: 0.0.0.0 and remove network.publish_host.

[1:3] is not really written in the configuration file.
It's a way to document that each node has its own IP address specified into the configuration file.

The real configuration parameters are:

elastic1:

network.host:
  - 127.0.0.1
  - 192.168.245.71

elastic2:

network.host:
  - 127.0.0.1
  - 192.168.245.72

elastic3:

network.host:
  - 127.0.0.1
  - 192.168.245.73

Thanks!

The few log messages you have shared are suggestive of network issues: it looks like a connection between the nodes is being established and then dropped when the nodes start to exchange meaningful information.

One possibility: do you have any kind of security device (e.g. firewall or IDS) which might be considering the Elasticsearch traffic as suspicious and dropping these connections? Elasticsearch will be exchanging information containing IP addresses and host names and so on and it's certainly possible that a badly-configured IDS could be triggered by that kind of traffic. If so, either disable it or else enable TLS on your cluster so it can't see the traffic any more.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.