I have an Elasticsearch 6.6 cluster with two nodes running on the first server using docker-compose
I want to add another node running on a separate server to get more physical resources.
I configured everything the same as the first two nodes using discovery.zen.ping.unicast.hosts settings
But when I started the node I got timeout error after initial discovery.
The reason for the timeout is that the first two nodes bound IP addresses are the docker's addresses, which the new node obviously can't route to them (although the hosts set for discovery is the outer address of the host)
What is the solution for setting up docker Elasticsearch nodes on separated server then?
The only quick and dirty solution I found is using preroute iptables rule to masquerade the inner subnets between the servers..
I upgraded my Elasticsearch to 6.8.18, and the issue consists
I exposed the port used for transport.tcp.port, and when I use the iptables hack it works but without it it still tries to connect to the container internal IP which is obviously unreachable for a container on another server
This is not an Elasticsearch issue and the version you are using has no influence, it is a connectivity issue related on how you are trying to start your cluster.
You need to make sure that your nodes can communicate with each other, that your nodes running on docker can resolve and connect to your other node outside docker and that this node can also resolve and connect to the docker nodes.
Maybe someone had a similar problem and can help, but this is an infrastructure issue, not an Elasticsearch one.
I dunno, it sort of is an Elasticsearch issue, at least it's a discrepancy between Elasticsearch's network requirements and how the network is actually set up at present. I don't think the 6.8 reference manual makes it very clear, but the relevant section in the 7.15 docs also largely applies to 6.8:
In particular (emphasis mine):
Each Elasticsearch node has an address at which clients and other nodes can contact it, known as its publish address . Each node has one publish address for its [...] transport interface [...] Each node must be accessible at its transport publish address by all other nodes in its cluster.
lets say for the following comment that the ip addresses of the two servers are 1.2.3.4 and 1.2.3.5.
My nodes are accessible, using the host server's IP and published port (curling 1.2.3.4:9300 from 1.2.3.5 will return this is not an HTTP port as expected)
The problem is that the zen hosts set for one of 1.2.3.4's nodes is discovery.zen.ping.unicast.hosts: "1.2.3.5:9300,1.2.3.5:9300" (the ports are published through docker and the port is set with transport.tcp.port=9300)
and the initial connection is made just fine, but then the connection to a node on the other server is failed due to timeout
and when looking at the logs is says the other node ip is 172...3 (docker internal ip) and not 1.2.3.4
I believe the node tells its address to the new node after the initial negotiation and then it gives the address set as publish_address
I tried to change the publish address to 1.2.3.4, but obviously it didn't work because the container doesn't recognize this address
and I also tried 0.0.0.0, but because the node listened on it it just used the container address again
And finally, as I said in the first post - if i masquerade the address using iptables (making the host thing 172... is 1.2.3.4 and the other way around on the other server) it works fine
Secondly I get the problem now, I assume my setup (few docker nodes and two separate servers) isn't really supported by elastic
And I guess the solution is to set up a single node, running natively, on each server. Or running docker nodes on some kind of cluster such as k8s. Am I right?
The simplest approach is just to use a host network rather than a bridge one, but you can get fancy with k8s if you want. Note that the docs on bridge networking say this:
Bridge networks apply to containers running on the same Docker daemon host. For communication among containers running on different Docker daemon hosts, you can either manage routing at the OS level, or you can use an overlay network.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.