Can't Join 3rd Node to Cluster

Hi Everyone,
I am trying to setup a 3 node cluster. I am running each ES node in a docker container on a separate physical host. I was able to setup es01 as the initial member, add the second node es02 however when adding es03 I cannot seem to get it joined to the cluster. The third node is on a separate network but the FW rules seem to work (I can curl/telnet to the various ports). I have a logstash instance on the same host and its successfully able to send data to es01/02 but my es03 node will still not join.

I turned on tracing and it just continuously repeats the discovery. I believe it is discovering the other nodes properly but for some reason it will not join the cluster. Any insight?

ES03 Log

{"type": "server", "timestamp": "2021-08-24T21:51:25,261Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.11:9300, discoveryNode={es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=false} requesting peers" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.10:9300, discoveryNode={es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=false} requesting peers" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "probing master nodes from cluster state: nodes: \n   {es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}{ml.machine_memory=33464684544, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=12884901888}, local\n" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "startProbe(192.168.1.10:9301) not probing local node" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "resolved host [192.168.0.11:9300] to [192.168.0.11:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "resolved host [192.168.0.10:9300] to [192.168.0.10:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "probing resolved transport addresses [192.168.0.11:9300, 192.168.0.10:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,267Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.10:9300, discoveryNode={es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional[{es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}], knownPeers=[], term=13}" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,267Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.11:9300, discoveryNode={es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional[{es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}], knownPeers=[], term=13}" }
{"type": "server", "timestamp": "2021-08-24T21:51:26,106Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es01] to bootstrap a cluster: have discovered [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}, {es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}, {es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}]; discovery will continue using [192.168.0.11:9300, 192.168.0.10:9300] from hosts providers and [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}] from last-known cluster state; node term 13, last-accepted version 0 in term 0" }

On the currently elected master node only when I stop ES03 does it throw the below error.

{"type": "server", "timestamp": "2021-08-24T21:49:40,345Z", "level": "WARN", "component": "o.e.c.c.Coordinator", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "failed to validate incoming join request from node [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{mZ3w_2SUQx-_pgJgPqoaHw}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}{ml.machine_memory=33464684544, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}]", "cluster.uuid": "BGTrs1VSQDKnYxwVMqVRCg", "node.id": "xsVvKOd2SBeYzJgpW-b5RQ" ,
"stacktrace": ["org.elasticsearch.transport.NodeDisconnectedException: [es03][192.168.1.10:9301][internal:cluster/coordination/join/validate] disconnected"] }

Welcome to our community! :smiley:

Can you share your elasticsearch.yml?

Thanks @warkolm !

There isn't much in the elasticsearch.yml file so I've also included the environment variables that are set and my docker-compose file. I've tried removing cluster.initial_master_nodes as my understanding is that this is only required on the initialization of the cluster. I've also used the hostnames and IPs for the discovery.seed_hosts.

elasticsearch.yml:

network.host: 0.0.0.0
logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService: TRACE
logger.org.elasticsearch.discovery: TRACE

environment variables:

bootstrap.memory_lock=true
http.port=9201
LANG=en_US.UTF-8
HOSTNAME=host3.example.com
node.name=es03
ELASTIC_CONTAINER=true
cluster.initial_master_nodes=es01
transport.tcp.port=9301
XPACK_REPORTING_ENABLED=true
XPACK_MONITORING_ENABLED=true
XPACK_SECURITY_ENABLED=false
discovery.seed_hosts=192.168.0.10:9300,192.168.0.11:9300
ES_JAVA_OPTS=-Xms12288m -Xmx12288m
cluster.name=es-docker-cluster

Docker Compose File:

version: '3.8'
services:
  es03:
    image: elasticsearch:7.13.2
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=192.168.0.10:9300,192.168.0.11:9300
      - cluster.initial_master_nodes=es01
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms12288m -Xmx12288m"
      - XPACK_SECURITY_ENABLED=false
      - XPACK_REPORTING_ENABLED=true
      - XPACK_MONITORING_ENABLED=true
      - http.port=9201
      - transport.tcp.port=9301
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - /u01/docker/elk_workspace/elastic_data/es03:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
      - 9301:9300
    network_mode: host
    restart: always

Can anyone help with the next steps on how to debug the problem?

Thanks in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.