Unable to join two nodes in cluster

enVolt · July 19, 2019, 12:17pm

Hello,

I'm trying to set up two ES node in cluster nodes, and has absolutely given up as I'm posting this.

I've two instance running, with following configs -

# es-01
cluster.name: dc-world
node.name: smallville
path.data: /es/data
path.logs: /es/log
network.host: 0.0.0.0
discovery.seed_hosts: ["10.116.48.116"]  # IP of es-02
cluster.initial_master_nodes: ["gotham", "smallville"]

# es-02
cluster.name: dc-world
node.name: gotham
path.data: /es/data
path.logs: /es/log
network.host: 0.0.0.0
network.publish_host: _site_ 
cluster.initial_master_nodes: ["gotham", "smallville"]

I'm getting following error (from es-01 logs) (stack trace reduced; due to post limit) -

[2019-07-19T06:33:02,824][WARN ][o.e.c.c.ClusterFormationFailureHelper] [smallville] master not discovered or elected yet, an election requires two nodes with ids [JYKLqIsvR0yruH0ecPG4wA, CTAdAiYbT-ajAXtSVEs3Bw], have discovered [{gotham}{CTAdAiYbT-ajAXtSVEs3Bw}{V18vdJW3TyKlfMJIJhlGkQ}{10.116.48.116}{10.116.48.116:9300}{ml.machine_memory=16651354112, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [10.116.48.116:9300] from hosts providers and [{smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 149, last-accepted version 0 in term 0
[2019-07-19T06:33:06,904][INFO ][o.e.c.c.JoinHelper       ] [smallville] failed to join {smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20} with JoinRequest{sourceNode={smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=150, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}, targetNode={smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}}]}
org.elasticsearch.transport.RemoteTransportException: [smallville][161.202.2.237:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: received a newer join from {smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}
	at org.elasticsearch.cluster.coordination.JoinHelper$CandidateJoinAccumulator.handleJoinRequest(JoinHelper.java:451) [elasticsearch-7.2.0.jar:7.2.0]
......
....

es-02 logs -

[2019-07-19T06:37:45,288][WARN ][o.e.c.c.ClusterFormationFailureHelper] [gotham] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [gotham, smallville] to bootstrap a cluster: have discovered []; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305] from hosts providers and [{gotham}{CTAdAiYbT-ajAXtSVEs3Bw}{V18vdJW3TyKlfMJIJhlGkQ}{10.116.48.116}{10.116.48.116:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 218, last-accepted version 0 in term 0
[2019-07-19T06:38:28,567][INFO ][o.e.c.c.JoinHelper       ] [gotham] failed to join {smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={gotham}{CTAdAiYbT-ajAXtSVEs3Bw}{V18vdJW3TyKlfMJIJhlGkQ}{10.116.48.116}{10.116.48.116:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=225, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={gotham}{CTAdAiYbT-ajAXtSVEs3Bw}{V18vdJW3TyKlfMJIJhlGkQ}{10.116.48.116}{10.116.48.116:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}, targetNode={smallville}{JYKLqIsvR0yruH0ecPG4wA}{-9WFnGHqQnaxid1cV4oAHg}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.NodeNotConnectedException: [smallville][161.202.2.237:9300] Node not connected
	at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:151) ~[elasticsearch-7.2.0.jar:7.2.0]
...
...

This has been after numerous failed attempt, and after reading/digesting everything I could find (forming single node clusters; security group rule issues; understanding 9200 vs 9300; almost failing to understand unicast; reading articles about 6.8's zen discovery)

Before I added network.publish_host: _site_ in es-02 config, I was getting this in the logs (from es-01; similar on es-02) -

[2019-07-19T05:53:55,034][WARN ][o.e.c.c.ClusterFormationFailureHelper] [smallville] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [10.116.48.112, 10.116.48.116] to bootstrap a cluster: have discovered []; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305] from hosts providers and [{smallville}{8MEe4svaTseS-P3BwzjK9A}{ecMet_NNRv2arG-ZAuuq7g}{161.202.2.237}{161.202.2.237:9300}{ml.machine_memory=16651354112, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

If you need any more info; please let me know.
I'm on version 7.2.0 (build 508c38a from rpm). OS - CentOS 7

DavidTurner · July 19, 2019, 1:36pm

I am guessing you've tried a bunch of different settings and it looks like the nodes have got into a bit of a funny state as a result. Has this cluster ever worked? If not, and it contains no data, then the best thing to try is:

shut both of the nodes down
delete the contents of /es/data on both nodes
start them up again

The configs look reasonable, but perhaps are not wholly consistent with the state on disk any more, so this'll reset that state and give you a clean start.

One other thing: it's generally best to put the addresses of all master-eligible nodes in the discovery.seed_hosts setting on each node. Even the local node. Not essential, but if your nodes are more symmetric then future troubleshooting will likely be easier.

DavidTurner · July 19, 2019, 1:45pm

Hmm actually it looks like you also have connectivity issues. gotham seems to think its address is 10.116.48.116:9300 but smallville is 161.202.2.237:9300. Should they really have such wildly different addresses? Why do you have network.publish_host: _site_ on one and not the other? Can you try getting rid of this and setting network.host to each node's proper address instead of 0.0.0.0?

enVolt · July 22, 2019, 6:20am

161.202.2.237:9300 is the public IP. Both the instances are inside VPC, hence I tried changing the publish_host address to local IP (but only on a single node; i assumed it to be just for discovery purpose; if A can reach B, B can communicate with A and find out where is A)

I missed the fact that network.host is also for identifying host. When I put 0.0.0.0, I wanted it to be accessible from anywhere (I had Security Groups in place for disabling public access)

Just adding following two lines did the charm -

network.host: _site_
network.publish_host: _site_

(Also I cleaned both the data and log directory before changing the config)

Though, the bind address is now private network IP, which is absolutely fine.

Thanks,

DavidTurner · July 22, 2019, 6:32am

This setting defaults to being equal to network.host so I think you do not need to set it explicitly like this.

system · August 19, 2019, 6:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES 2.3.3 node cannot join the master node Elasticsearch	5	2167	July 5, 2017
Unable to setup elasticsearch cluster Elasticsearch	3	1265	May 26, 2017
Cluster error Elasticsearch	11	1817	March 8, 2018
No nodes will join cluster Elasticsearch	3	1454	July 6, 2017
ELK multiple NODE on 2 server Elasticsearch	2	741	March 1, 2018

Unable to join two nodes in cluster

Related topics