Hi,
after having some problems on another test cluster (I asked help on this post) I'm trying to setup a new cluster from scratch to understand what's happening.
I'm stuck because this it the third time I'm trying to run the new cluster but the master host cannot be discovered.
My three nodes are:
- elastic1.domain.com: 192.168.245.71
- elastic2.domain.com: 192.168.245.72
- elastic3.domain.com: 192.168.245.72
Every node is on the same network and all nodes can see each others, there are no firewall rules, Elasticsearch is running on every nodes and it's listening on their own IPs. Every node is both master and data.
This is the node configuration:
cluster.name: mycluster
node.name: elastic[1:3].domain.com
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host:
- 127.0.0.1
- 192.168.245.7[1:3]
network.publish_host: 192.168.245.7[1:3]
discovery.seed_hosts:
- elastic1.domain.com
- elastic2.domain.com
- elastic3.domain.com
cluster.initial_master_nodes:
- elastic1.domain.com
- elastic2.domain.com
- elastic3.domain.com
I started every node from scratch, with /var/lib/elasticsearch
folder empty and one by one. I tried to repeat the procedure two times without any successs.
In the logs I see many errors like these:
[2019-09-17T18:19:57,497][INFO ][o.e.c.c.JoinHelper ] [elastic3.domain.com] failed to join {elastic2.domain.com}{hwnyk1WnRYmD9yktsXg73g}{SZgLY6V2TkS_Ossu0gUxJw}{192.168.245.72}{192.168.245.72:9300}{dim}{ml.machine_memory=16822104064, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={elastic3.domain.com}{KjVRP9WgTQy-P8JV18LbDw}{bVvohjzoQ0C5LQcUKjckew}{192.168.245.73}{192.168.245.73:9300}{dim}{ml.machine_memory=16822104064, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=35, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={elastic3.domain.com}{KjVRP9WgTQy-P8JV18LbDw}{bVvohjzoQ0C5LQcUKjckew}{192.168.245.73}{192.168.245.73:9300}{dim}{ml.machine_memory=16822104064, xpack.installed=true, ml.max_open_jobs=20}, targetNode={elastic2.domain.com}{hwnyk1WnRYmD9yktsXg73g}{SZgLY6V2TkS_Ossu0gUxJw}{192.168.245.72}{192.168.245.72:9300}{dim}{ml.machine_memory=16822104064, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.NodeDisconnectedException: [elastic2.domain.com][192.168.245.72:9300][internal:cluster/coordination/join] disconnected
[2019-09-17T18:19:57,501][INFO ][o.e.c.c.Coordinator ] [elastic3.domain.com] master node [{elastic2.domain.com}{hwnyk1WnRYmD9yktsXg73g}{SZgLY6V2TkS_Ossu0gUxJw}{192.168.245.72}{192.168.245.72:9300}{dim}{ml.machine_memory=16822104064, ml.max_open_jobs=20, xpack.installed=true}] failed, restarting discovery
org.elasticsearch.transport.ConnectTransportException: [elastic2.domain.com][192.168.245.72:9300] disconnected during check
at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler$1.handleException(LeaderChecker.java:268) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:544) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.handleWakeUp(LeaderChecker.java:237) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.coordination.LeaderChecker.updateLeader(LeaderChecker.java:150) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:620) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:243) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.coordination.FollowersChecker$2.doRun(FollowersChecker.java:187) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.3.2.jar:7.3.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic2.domain.com][192.168.245.72:9300] Node not connected
at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:151) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:568) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:540) ~[elasticsearch-7.3.2.jar:7.3.2]
... 10 more
Could you help me please?
I am really hanged on this problem.
Thanks!