Dockerized elastic cluster "failed to send join request to master"

Hello i'm trying to set an elastic cluster on 2 different hosts with 2 nodes running on docker.
i'm running on Red Hat Enterprise Linux Server : 7.4 (Maipo) with openjdk version "1.8.0_161" and Elasticsearch 6.8.0.

After setting everything i'm getting the info
Preformatt2019-07-15T09:22:10.519827796Z [2019-07-15T09:22:10,519][INFO ][o.e.d.z.ZenDiscovery ] [es2] failed to send join request to master [{es3}{YwvBcVrZSLeC45PHJCigHA}{W1GBv-xVRWaEkNsYwtEIBw}{172.16.0.4}{172.16.0.4:9300}], reason [RemoteTransportException[[es3][172.19.0.2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[es2][172.18.0.2:9300] connect_timeout[30s]]; ]

On host 1 with ip address 176.16.0.4 i have this docker-compose file which is the master node
version: '2' services: es3: container_name: es3 build: context: elasticsearch/ args: ELK_VERSION: $ELK_VERSION SG_VERSION: $SG_VERSION volumes: - ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro - /data/elk_data:/usr/share/elasticsearch/data ports: - "9200:9200" - "9300:9300" environment: - node.data=false - node.master=true - node.name=es3 - "ES_JAVA_OPTS= -Xmx1g -Xms1g" ulimits: memlock: soft: -1 hard: -1 networks: - elk networks: elk: driver: bridge

and elasticsearch.yml :
network.bind_host: 0.0.0.0 network.publish_host: 172.16.0.4 #transport.host: _eth0_ #network.host: 0.0.0.0 cluster.name: alaa-cluster bootstrap.memory_lock: true #discovery.zen.ping.unicast.hosts: ["172.16.0.7:9302","172.16.0.7:9301"] discovery.zen.ping.unicast.hosts: ["172.16.0.7:9302","176.16.0.4:9300"] discovery.zen.minimum_master_nodes: 1
on the second host which ip address is 172.16.0.7 the data node goes by the name es2 here is a sample of docker-compose file :
es2: container_name: es2 build: context: elasticsearch/ args: ELK_VERSION: $ELK_VERSION SG_VERSION: $SG_VERSION volumes: - ./elasticsearch/config/elasticsearch2.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro ports: - "9202:9200" - "9302:9300" environment: - node.name=es2 - node.data=true - node.master=false - "ES_JAVA_OPTS= -Xmx1g -Xms1g" ulimits: memlock: soft: -1 hard: -1 networks: - elk networks: elk: driver: bridge
and it's elasticsearch.yml file which goes by elasticsearch2.yml is :
#network.bind_host: 0.0.0.0 #network.publish_host: 172.18.0.3 #network.host: 0.0.0.0 network.host: _eth0_ #transport.host: 0.0.0.0 #transport.port 9302 cluster.name: alaa-cluster bootstrap.memory_lock: true #discovery.zen.ping.unicast.hosts: [es1,172.16.0.4] discovery.zen.minimum_master_nodes: 1 discovery.zen.ping.unicast.hosts: [172.16.0.4,"172.16.0.7:9302"]
Now the logs on the data node es2 :
2019-07-15T10:16:50.408729761Z [2019-07-15T10:16:50,406][INFO ][o.e.d.DiscoveryModule ] [es2] using discovery type [zen] and host providers [settings] 2019-07-15T10:16:51.240730607Z [2019-07-15T10:16:51,237][INFO ][o.e.n.Node ] [es2] initialized 2019-07-15T10:16:51.240771907Z [2019-07-15T10:16:51,237][INFO ][o.e.n.Node ] [es2] starting ... 2019-07-15T10:16:51.449721366Z [2019-07-15T10:16:51,448][INFO ][o.e.t.TransportService ] [es2] publish_address {172.18.0.2:9300}, bound_addresses {172.18.0.2:9300} 2019-07-15T10:16:51.469678815Z [2019-07-15T10:16:51,468][INFO ][o.e.b.BootstrapChecks ] [es2] bound or publishing to a non-loopback address, enforcing bootstrap checks 2019-07-15T10:17:21.534389997Z [2019-07-15T10:17:21,534][INFO ][o.e.h.n.Netty4HttpServerTransport] [es2] publish_address {172.18.0.2:9200}, bound_addresses {172.18.0.2:9200} 2019-07-15T10:19:51.995301653Z [2019-07-15T10:19:51,994][INFO ][o.e.d.z.ZenDiscovery ] [es2] failed to send join request to master [{es3}{YwvBcVrZSLeC45PHJCigHA}{nzUbrmnrTIuE6OPeVMAzYw}{172.16.0.4}{172.16.0.4:9300}], reason [RemoteTransportException[[es3][172.19.0.2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[es2][172.18.0.2:9300] connect_timeout[30s]]; ]
As for the master node here are the logs:
2019-07-15T10:17:06.730431822Z [2019-07-15T10:17:06,730][INFO ][c.f.s.c.ComplianceConfig ] [es3] PII configuration [auditLogPattern=null, auditLogIndex=null]: {} 2019-07-15T10:17:07.117394240Z [2019-07-15T10:17:07,116][INFO ][o.e.d.DiscoveryModule ] [es3] using discovery type [zen] and host providers [settings] 2019-07-15T10:17:07.874398723Z [2019-07-15T10:17:07,873][INFO ][o.e.n.Node ] [es3] initialized 2019-07-15T10:17:07.874444022Z [2019-07-15T10:17:07,873][INFO ][o.e.n.Node ] [es3] starting ... 2019-07-15T10:17:08.055136517Z [2019-07-15T10:17:08,054][INFO ][o.e.t.TransportService ] [es3] publish_address {172.16.0.4:9300}, bound_addresses {0.0.0.0:9300} 2019-07-15T10:17:08.072847368Z [2019-07-15T10:17:08,072][INFO ][o.e.b.BootstrapChecks ] [es3] bound or publishing to a non-loopback address, enforcing bootstrap checks 2019-07-15T10:17:08.086368930Z [2019-07-15T10:17:08,086][INFO ][c.f.s.c.IndexBaseConfigurationRepository] [es3] Check if searchguard index exists ... 2019-07-15T10:17:11.158815537Z [2019-07-15T10:17:11,157][INFO ][o.e.c.s.MasterService ] [es3] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {es3}{YwvBcVrZSLeC45PHJCigHA}{nzUbrmnrTIuE6OPeVMAzYw}{172.16.0.4}{172.16.0.4:9300} 2019-07-15T10:17:11.173428196Z [2019-07-15T10:17:11,169][INFO ][o.e.c.s.ClusterApplierService] [es3] new_master {es3}{YwvBcVrZSLeC45PHJCigHA}{nzUbrmnrTIuE6OPeVMAzYw}{172.16.0.4}{172.16.0.4:9300}, reason: apply cluster state (from master [master {es3}{YwvBcVrZSLeC45PHJCigHA}{nzUbrmnrTIuE6OPeVMAzYw}{172.16.0.4}{172.16.0.4:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]]) 2019-07-15T10:17:11.200974219Z [2019-07-15T10:17:11,200][INFO ][o.e.h.n.Netty4HttpServerTransport] [es3] publish_address {172.16.0.4:9200}, bound_addresses {0.0.0.0:9200} 2019-07-15T10:17:11.201196118Z [2019-07-15T10:17:11,200][INFO ][o.e.n.Node ] [es3] started2019-07-15T10:17:11.221827960Z [2019-07-15T10:17:11,220][INFO ][o.e.g.GatewayService ] [es3] recovered [0] indices into cluster_state

I'm guessing it's a networking issue that have to do with my configuration in elasticsearch.yml it's woth mentioning that i can "curl 172.16.0.7:9202 and get an answer inside the master node container ES3 "
and same goes for inside the data node container i can curl 172.16.0.4:9200 and also get an answer.

Just to clarify " master host ip is : 172.16.0.4 & container ip is : 172.19.0.2 AS for node host ip is : 176.16.0.7 and container ip is : 172.18.0.2 "

Thanks in advance for the help and best regards :slight_smile:

Hi @Alaa_bdira,

Yes, it looks like that to me too.

The nodes do not communicate with each other on this port (i.e. using HTTP). Instead they're using transport ports like 9300:

{es3}{YwvBcVrZSLeC45PHJCigHA}{nzUbrmnrTIuE6OPeVMAzYw}{172.16.0.4}{172.16.0.4:9300}

You should check connectivity on these ports.

@DavidTurner
Hello thanks for the quick response i can also get an answer be it on the ES2 when i curl 176.16.0.4:9300 or on the master node ES3 when i curl 176.16.0.7:9302.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.