Hi Everyone,
I am trying to setup a 3 node cluster. I am running each ES node in a docker container on a separate physical host. I was able to setup es01 as the initial member, add the second node es02 however when adding es03 I cannot seem to get it joined to the cluster. The third node is on a separate network but the FW rules seem to work (I can curl/telnet to the various ports). I have a logstash instance on the same host and its successfully able to send data to es01/02 but my es03 node will still not join.
I turned on tracing and it just continuously repeats the discovery. I believe it is discovering the other nodes properly but for some reason it will not join the cluster. Any insight?
ES03 Log
{"type": "server", "timestamp": "2021-08-24T21:51:25,261Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.11:9300, discoveryNode={es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=false} requesting peers" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.10:9300, discoveryNode={es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=false} requesting peers" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "probing master nodes from cluster state: nodes: \n {es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}{ml.machine_memory=33464684544, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=12884901888}, local\n" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,262Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "startProbe(192.168.1.10:9301) not probing local node" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "resolved host [192.168.0.11:9300] to [192.168.0.11:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "resolved host [192.168.0.10:9300] to [192.168.0.10:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,263Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "probing resolved transport addresses [192.168.0.11:9300, 192.168.0.10:9300]" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,267Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.10:9300, discoveryNode={es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional[{es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}], knownPeers=[], term=13}" }
{"type": "server", "timestamp": "2021-08-24T21:51:25,267Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "Peer{transportAddress=192.168.0.11:9300, discoveryNode={es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional[{es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}{ml.machine_memory=33464709120, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}], knownPeers=[], term=13}" }
{"type": "server", "timestamp": "2021-08-24T21:51:26,106Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es01] to bootstrap a cluster: have discovered [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}, {es02}{XSQzKuniSIyXg4zOudc_7A}{1JfxnrRJSTimPw_qz2LOlg}{192.168.0.11}{192.168.0.11:9300}{cdfhilmrstw}, {es01}{xsVvKOd2SBeYzJgpW-b5RQ}{iJeiZrRRQPS-GJb7DxUAaA}{192.168.0.10}{192.168.0.10:9300}{cdfhilmrstw}]; discovery will continue using [192.168.0.11:9300, 192.168.0.10:9300] from hosts providers and [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{LrNTCc-BRBeHN2Qdd8DY7Q}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}] from last-known cluster state; node term 13, last-accepted version 0 in term 0" }
On the currently elected master node only when I stop ES03 does it throw the below error.
{"type": "server", "timestamp": "2021-08-24T21:49:40,345Z", "level": "WARN", "component": "o.e.c.c.Coordinator", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "failed to validate incoming join request from node [{es03}{l0_UmsCwQ1KBNPtb3jkMiQ}{mZ3w_2SUQx-_pgJgPqoaHw}{192.168.1.10}{192.168.1.10:9301}{cdfhilmrstw}{ml.machine_memory=33464684544, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=12884901888, transform.node=true}]", "cluster.uuid": "BGTrs1VSQDKnYxwVMqVRCg", "node.id": "xsVvKOd2SBeYzJgpW-b5RQ" ,
"stacktrace": ["org.elasticsearch.transport.NodeDisconnectedException: [es03][192.168.1.10:9301][internal:cluster/coordination/join/validate] disconnected"] }