I have a single node cluster and this is operating fine. I want to join two other nodes to this cluster. Each time I try I either get another single node cluster, or elasticsearch will not start. Here is the output of the yml on the single node.
Trying to start elasticsearch on new_node1 with these files causes ES to error out and not start. I can't figure out if I have a configuration wrong or if something is preventing these hosts from connecting.
[2019-11-04T11:11:35,282][INFO ][o.e.c.c.JoinHelper ] [new_node1] failed to join {master_node}{CDuQWuHvTbCNQgjRw6TPFw}{LVW2CAYnQK23hC2_fQ53UQ}{10.6.48.235}{10.6.48.235:9300}{dilm}{ml.machine_memory=8203476992, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={new_node1}{5ptJjwi-S2-t6X4yv8fufg}{rToluebFR5eeXHYTdfxuUA}{10.6.48.233}{10.6.48.233:9300}{dilm}{ml.machine_memory=8203476992, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=23, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={new_node1}{5ptJjwi-S2-t6X4yv8fufg}{rToluebFR5eeXHYTdfxuUA}{10.6.48.233}{10.6.48.233:9300}{dilm}{ml.machine_memory=8203476992, xpack.installed=true, ml.max_open_jobs=20}, targetNode={master_node}{CDuQWuHvTbCNQgjRw6TPFw}{LVW2CAYnQK23hC2_fQ53UQ}{10.6.48.235}{10.6.48.235:9300}{dilm}{ml.machine_memory=8203476992, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master_node][10.6.48.235:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [new_node1][10.6.48.233:9300] connect_exception
Caused by: java.io.IOException: No route to host: 10.6.48.233/10.6.48.233:9300
Caused by: java.io.IOException: No route to host
.233 is the new node and .235 is the master. From what is looks like though it seems that .233 (new_node1) can't talk to itself? Am I reading that right?
Not quite, this exception indicates that the node can talk to the master but the master can't connect back to the new node. Elasticsearch forms connections in both directions.
I was showing my troubles to a teammate and he pointed out that 9200 and 9300 both show binds to tcp6. This doesn't seem right, but I am not sure how I change that. I have attempted to disable IPv6 across the entire server, but these won't switch to IPv4.
I was not quite sure which server you wanted me to run this on, so I did both.
the results for the curl command from the new node (.223) to itself is:
This is not an HTTP port[root@new_node1
The results from the master node (.235) to the new node (.233) is:
curl: (7) Failed connect to 10.6.48.233:9300; Connection timed out
Ok, that tells us that tcp6 isn't a problem since the call worked on the data node, but that there is something in your network configuration preventing the master node connecting to the data node.
I do not have any firewall or the like between these hosts and they all share a network. (.233, .234, .235). Is there some sort of port settings for the server I need to configure? the below shows outputs on the correct ports, but am I receiving correctly?
The screenshot you've shared looks normal to me, but there are many other things that could be causing this connectivity issue. I can't really help much more with this kind of issue since it depends so much on your environment.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.