Failed to connect to master node


(Steve Button) #1

Hi,

We are having problems setting up a cluster on Azure using docker. We are using three docker run commands to start each of the nodes, along the lines of :-

sudo docker run --net=bridge -d -e "node.master=true" -e "transport.host=0.0.0.0" -e "network.host=0.0.0.0" -e "network.bind_host=0.0.0.0" -e "network.publish_host=0.0.0.0" -e "node.name=esuk02" -e "discovery.zen.minimum_master_nodes=2" -e "ELASTIC_PASSWORD=xxx" -e "bootstrap.memory_lock=true" -e "ES_JAVA_OPTS=-Xms8g -Xmx8g" -e "xpack.security.http.ssl.enabled=true" -e "xpack.security.transport.ssl.enabled=true" -e "xpack.security.transport.ssl.verification_mode=certificate" -e "xpack.ssl.certificate_authorities=x-pack/certificates/certs/ca/ca.crt" -e "xpack.ssl.certificate=x-pack/certificates/certs/ukselastic2/ukselastic2.crt" -e "xpack.ssl.key=x-pack/certificates/certs/ukselastic2/ukselastic2.key" -e "discovery.zen.ping.unicast.hosts=10.4.0.10,10.4.0.11,10.4.0.12" --mount type=bind,source=/var/lib/waagent,target=/usr/share/elasticsearch/config/x-pack/certificates,readonly -p 9200:9200 -p 9300:9300 --ulimit memlock=-1:-1 docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.0

This is created with Terraform, which substitutes the relevant items such as the node.name and the certs

In our logs we're seeing :-

which I think looks OK. But on node 2 :-

And likewise we get a similar thing on node 3.

I can netcat to each of the nodes from each other using 9200 or 9300, using either the 10.4.0.x address OR the 172.17.0.x address, from the host or by executing a bash prompt within the docker container and that works just fine, so I don't think I have connectivity or firewall type problems. It must be something wrong with my ES config then (right?)

Here's the docker run for node 1 :-

sudo docker run --net=bridge -d -e "node.master=true" -e "transport.host=0.0.0.0"   -e "network.host=0.0.0.0"   -e "node.name=esuk01"   -e "discovery.zen.minimum_master_nodes=2"   -e "ELASTIC_PASSWORD=xxx"   -e "bootstrap.memory_lock=true"   -e "ES_JAVA_OPTS=-Xms8g -Xmx8g"   -e "xpack.security.http.ssl.enabled=true"   -e "xpack.security.transport.ssl.enabled=true"   -e "xpack.security.transport.ssl.verification_mode=certificate"   -e "xpack.ssl.certificate_authorities=x-pack/certificates/certs/ca/ca.crt"   -e "xpack.ssl.certificate=x-pack/certificates/certs/ukselastic1/ukselastic1.crt"   -e "xpack.ssl.key=x-pack/certificates/certs/ukselastic1/ukselastic1.key" -e "discovery.zen.ping.unicast.hosts=10.4.0.10,10.4.0.12"  --mount type=bind,source=/var/lib/waagent,target=/usr/share/elasticsearch/config/x-pack/certificates,readonly   -p 9200:9200   -p 9300:9300   --ulimit memlock=-1:-1   docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.0

and node 0 :-

sudo docker run --net=bridge -d -e "node.master=true" -e "transport.host=0.0.0.0" -e "network.host=0.0.0.0" -e "node.name=esuk00" -e "discovery.zen.minimum_master_nodes=2" -e "ELASTIC_PASSWORD=xxx" -e "bootstrap.memory_lock=true" -e "ES_JAVA_OPTS=-Xms8g -Xmx8g" -e "xpack.security.http.ssl.enabled=true" -e "xpack.security.transport.ssl.enabled=true" -e "xpack.security.transport.ssl.verification_mode=certificate" -e "xpack.ssl.certificate_authorities=x-pack/certificates/certs/ca/ca.crt" -e "xpack.ssl.certificate=x-pack/certificates/certs/ukselastic0/ukselastic0.crt" -e "xpack.ssl.key=x-pack/certificates/certs/ukselastic0/ukselastic0.key" --mount type=bind,source=/var/lib/waagent,target=/usr/share/elasticsearch/config/x-pack/certificates,readonly -p 9200:9200 -p 9300:9300 --ulimit memlock=-1:-1 docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.0

Many thanks!

Steve Button


(Steve Button) #2

Actually, that's not quite true. I can

nc -v 172.17.0.1 9200
nc -v 172.17.0.1 9300
nc -v 172.17.0.2 9200
nc -v 172.17.0.2 9300

From inside the container and they all work OK, but 172.17.0.3 doesn't work. I guess I should change network.host or transport.host to bind to the 10.4.0.0 network instead of 0.0.0.0 which I guess is ANY interface. I'm a little unclear what each of these parameters actually does, and the docs don't expand on it much (at least the ones I was looking at)


(Steve Button) #3

I'm still looking into this, and further to this I've found that if I set transport.host to the IP of the node (which is shared with the docker container) for example 10.4.0.11 then I get "BindException[Cannot assign requested address];" and it's complaining about ports 9300-9400, however when I set transport.host to 0.0.0.0 it keeps running, but I get the message :-

[2018-02-15T11:43:39,122][WARN ][o.e.d.z.ZenDiscovery     ] [esuk01] failed to connect to master [{esuk00}{7AKsFpnVT2CazmBegPA4Lw}{W_f7IPanQc6PgXOqO5q0MA}{172.17.0.2}{172.17.0.2:9300}{ml.machine_memory=16797728768, ml.max_open_jobs=20, ml.enabled=etrying...
org.elasticsearch.transport.ConnectTransportException: [esuk00][172.17.0.2:9300] handshake failed. unexpected remote node {esuk01}{KllV7qs6R5WjFl0ZFqsvjA}{xiKH4kRKTOa9zIWbiQnEWA}{172.17.0.2}{172.17.0.2:9300}{ml.machine_memory=16797728768, ml.max_o20, ml.enabled=true}

which is interesting, because this is from node es01, and so it must be talking to es00 to get this message (otherwise how would it know if the existence of "es00"?), so I guess the question I need to answer is what does "failed to connect to master" actually mean? What makes that happen?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.