Node will not join cluster

Running version 7.3.1.

I have a single node cluster and this is operating fine. I want to join two other nodes to this cluster. Each time I try I either get another single node cluster, or elasticsearch will not start. Here is the output of the yml on the single node.

---------------------------------- Cluster -----------------------------------

Use a descriptive name for your cluster:

cluster.name: Elasticsearch_Test

------------------------------------ Node ------------------------------------

Use a descriptive name for the node:

node.name: master_node

---------------------------------- Network -----------------------------------

network.host: a.b.c.d

Set a custom port for HTTP:

http.port: 9200

--------------------------------- Discovery ----------------------------------

discovery.seed_hosts: ["master_node", "new_node1", " new_node2"]
cluster.initial_master_nodes: master_node

---------------------------------- Gateway -----------------------------------

Block initial recovery after a full cluster restart until N nodes are started:

#gateway.recover_after_nodes: 3

Output of the new node.yml file

---------------------------------- Cluster -----------------------------------

cluster.name: Elasticsearch_Test

------------------------------------ Node ------------------------------------

node.name: new_node1

---------------------------------- Network -----------------------------------

network.host: a.b.c.d

Set a custom port for HTTP:

http.port: 9200

--------------------------------- Discovery ----------------------------------

discovery.seed_providers: master_node
discovery.seed_hosts: ["new_node1", " new_node2", "master_node"]
cluster.initial_master_nodes: master_node
minimum_master_nodes: 2

---------------------------------- Gateway -----------------------------------

gateway.recover_after_nodes: 2

Trying to start elasticsearch on new_node1 with these files causes ES to error out and not start. I can't figure out if I have a configuration wrong or if something is preventing these hosts from connecting.

Hello @Terran
it would be great if you send the error log

After doing some work with the logs and cleaning up some bad entries on the .yml file I am left with this as an output on my logs.

[root@new_node1 elasticsearch]# curl -XGET "http://new_node1.tcore.com:9200/_cluster/health"
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}[root@new_node1 elasticsearch]#

[2019-11-04T11:11:35,282][INFO ][o.e.c.c.JoinHelper ] [new_node1] failed to join {master_node}{CDuQWuHvTbCNQgjRw6TPFw}{LVW2CAYnQK23hC2_fQ53UQ}{10.6.48.235}{10.6.48.235:9300}{dilm}{ml.machine_memory=8203476992, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={new_node1}{5ptJjwi-S2-t6X4yv8fufg}{rToluebFR5eeXHYTdfxuUA}{10.6.48.233}{10.6.48.233:9300}{dilm}{ml.machine_memory=8203476992, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=23, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={new_node1}{5ptJjwi-S2-t6X4yv8fufg}{rToluebFR5eeXHYTdfxuUA}{10.6.48.233}{10.6.48.233:9300}{dilm}{ml.machine_memory=8203476992, xpack.installed=true, ml.max_open_jobs=20}, targetNode={master_node}{CDuQWuHvTbCNQgjRw6TPFw}{LVW2CAYnQK23hC2_fQ53UQ}{10.6.48.235}{10.6.48.235:9300}{dilm}{ml.machine_memory=8203476992, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master_node][10.6.48.235:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [new_node1][10.6.48.233:9300] connect_exception
Caused by: java.io.IOException: No route to host: 10.6.48.233/10.6.48.233:9300
Caused by: java.io.IOException: No route to host

.233 is the new node and .235 is the master. From what is looks like though it seems that .233 (new_node1) can't talk to itself? Am I reading that right?

Not quite, this exception indicates that the node can talk to the master but the master can't connect back to the new node. Elasticsearch forms connections in both directions.

I have checked my network status and I don't see anything that would prevent connections. I checked the port settings and I see this:

tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 918/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1414/master
tcp6 0 0 10.6.48.233:9200 :::* LISTEN 6540/java
tcp6 0 0 10.6.48.233:9300 :::* LISTEN 6540/java
tcp6 0 0 :::22 :::* LISTEN 918/sshd
tcp6 0 0 ::1:25 :::* LISTEN 1414/master

It looks like 9300 is open so I am scratching my head.

I was showing my troubles to a teammate and he pointed out that 9200 and 9300 both show binds to tcp6. This doesn't seem right, but I am not sure how I change that. I have attempted to disable IPv6 across the entire server, but these won't switch to IPv4.

I don't think tcp6 is anything to worry about - this is normal and means that it's using the AF_INET6 address family which supports IPv4 and IPv6.

Does curl http://10.6.48.233:9300/ return This is not an HTTP port on the master node?

I was not quite sure which server you wanted me to run this on, so I did both.
the results for the curl command from the new node (.223) to itself is:
This is not an HTTP port[root@new_node1

The results from the master node (.235) to the new node (.233) is:
curl: (7) Failed connect to 10.6.48.233:9300; Connection timed out

Ok, that tells us that tcp6 isn't a problem since the call worked on the data node, but that there is something in your network configuration preventing the master node connecting to the data node.

I do not have any firewall or the like between these hosts and they all share a network. (.233, .234, .235). Is there some sort of port settings for the server I need to configure? the below shows outputs on the correct ports, but am I receiving correctly?

The screenshot you've shared looks normal to me, but there are many other things that could be causing this connectivity issue. I can't really help much more with this kind of issue since it depends so much on your environment.

I have resolved the issue. Turns out you should disable firewalld...

Thanks for the help.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.