Can't Join 3rd Node to Cluster

frustrated23 · August 24, 2021, 9:59pm

Hi Everyone,
I am trying to setup a 3 node cluster. I am running each ES node in a docker container on a separate physical host. I was able to setup es01 as the initial member, add the second node es02 however when adding es03 I cannot seem to get it joined to the cluster. The third node is on a separate network but the FW rules seem to work (I can curl/telnet to the various ports). I have a logstash instance on the same host and its successfully able to send data to es01/02 but my es03 node will still not join.

I turned on tracing and it just continuously repeats the discovery. I believe it is discovering the other nodes properly but for some reason it will not join the cluster. Any insight?

ES03 Log

{"type": "server", "timestamp": "2021-08-24T21:51:25,261Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.11:9300, discoveryNode={es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=false} requesting peers" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.10:9300, discoveryNode={es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=false} requesting peers" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "probing master nodes from cluster state: nodes: \n   {es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}{ml.machine_memory=33464684544, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=12884901888}, local\n" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "startProbe(192.168.1.10:9301) not probing local node" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "resolved host [192.168.0.11:9300] to [192.168.0.11:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "resolved host [192.168.0.10:9300] to [192.168.0.10:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "probing resolved transport addresses [192.168.0.11:9300, 192.168.0.10:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,267Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.10:9300, discoveryNode={es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional[{es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}], knownPeers=[], term=13}" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,267Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.11:9300, discoveryNode={es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional[{es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}], knownPeers=[], term=13}" }
{"type": "server", "timestamp": "2021-08-24T21:51:26,106Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es01] to bootstrap a cluster: have discovered [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}, {es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}, {es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}]; discovery will continue using [192.168.0.11:9300, 192.168.0.10:9300] from hosts providers and [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}] from last-known cluster state; node term 13, last-accepted version 0 in term 0" }

On the currently elected master node only when I stop ES03 does it throw the below error.

{"type": "server", "timestamp": "2021-08-24T21:49:40,345Z", "level": "WARN", "component": "o.e.c.c.Coordinator", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "failed to validate incoming join request from node [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{mZ3w_2SUQx-_pgJgPqoaHw}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}{ml.machine_memory=33464684544, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}]", "cluster.uuid": "BGTrs1VSQDKnYxwVMqVRCg", "node.id": "xsVvKOd2SBeYzJgpW-b5RQ" ,
"stacktrace": ["org.elasticsearch.transport.NodeDisconnectedException: [es03][192.168.1.10:9301][internal:cluster/coordination/join/validate] disconnected"] }

warkolm · August 25, 2021, 12:58am

Welcome to our community!

Can you share your elasticsearch.yml?

frustrated23 · August 25, 2021, 2:13am

Thanks @warkolm !

There isn't much in the elasticsearch.yml file so I've also included the environment variables that are set and my docker-compose file. I've tried removing cluster.initial_master_nodes as my understanding is that this is only required on the initialization of the cluster. I've also used the hostnames and IPs for the discovery.seed_hosts.

elasticsearch.yml:

network.host: 0.0.0.0
logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService: TRACE
logger.org.elasticsearch.discovery: TRACE

environment variables:

bootstrap.memory_lock=true
http.port=9201
LANG=en_US.UTF-8
HOSTNAME=host3.example.com
node.name=es03
ELASTIC_CONTAINER=true
cluster.initial_master_nodes=es01
transport.tcp.port=9301
XPACK_REPORTING_ENABLED=true
XPACK_MONITORING_ENABLED=true
XPACK_SECURITY_ENABLED=false
discovery.seed_hosts=192.168.0.10:9300,192.168.0.11:9300
ES_JAVA_OPTS=-Xms12288m -Xmx12288m
cluster.name=es-docker-cluster

Docker Compose File:

version: '3.8'
services:
  es03:
    image: elasticsearch:7.13.2
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=192.168.0.10:9300,192.168.0.11:9300
      - cluster.initial_master_nodes=es01
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms12288m -Xmx12288m"
      - XPACK_SECURITY_ENABLED=false
      - XPACK_REPORTING_ENABLED=true
      - XPACK_MONITORING_ENABLED=true
      - http.port=9201
      - transport.tcp.port=9301
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - /u01/docker/elk_workspace/elastic_data/es03:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
      - 9301:9300
    network_mode: host
    restart: always

frustrated23 · August 26, 2021, 12:21pm

Can anyone help with the next steps on how to debug the problem?

Thanks in advance!

system · September 23, 2021, 12:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nodes can't join the cluster Elasticsearch docker	8	1672	June 29, 2022
The third node cannot join the cluster (ES-7.1.0) Elasticsearch	9	2085	July 3, 2019
Cluster Setup 3 Node Cluster problem Elasticsearch	48	2011	August 12, 2019
Elasticsearch client node in Docker unable to join non Docker cluster Elasticsearch	2	1437	July 5, 2017
HELP - 2nd node not joining cluster Elasticsearch	4	413	November 29, 2018

Can't Join 3rd Node to Cluster

Related topics