ECK - "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster

aman26ps · June 20, 2020, 10:37am

Hi team,

We are facing a strange issue where when we are giving master node count in the elasticsearch Custom resource more than 1, the master nodes are not able to elect the master, we have tried giving odd numbers upto 11 and still master election didn't happen, logs from the master pod:

{"type": "server", "timestamp": "2020-06-20T09:25:59,336Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elasticsearch-config", "
node.name": "elasticsearch-config-es-master-0", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must
discover master-eligible nodes [elasticsearch-config-es-master-0, elasticsearch-config-es-master-1, elasticsearch-config-es-master-2, elasticsearch-config-es-master-3, elas
ticsearch-config-es-master-4] to bootstrap a cluster: have discovered [{elasticsearch-config-es-master-0}{zHswPo-WT6uH0ZrEt4tgdQ}{FNfaLOFVQGCEWEj4vLG1mA}{10.124.15.227}{10.
124.15.227:9300}{lm}{ml.machine_memory=3221225472, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:930
2, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.124.14.144:9300, 10.124.24.146:9300, 10.124.3.180:9300, 10.124.6.67:9300] from hosts providers and [{elasticsearch-con
fig-es-master-0}{zHswPo-WT6uH0ZrEt4tgdQ}{FNfaLOFVQGCEWEj4vLG1mA}{10.124.15.227}{10.124.15.227:9300}{lm}{ml.machine_memory=3221225472, xpack.installed=true, ml.max_open_jobs
=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

it seems strange as we are using same config to deploy on multiple namespaces in a cluster and it was working fine,

suddenly this issue came out of nowhere.

only change happened in GKE cluster is it was patched from 1.15 to 1.16, will that create an issue?

one more observation after troubleshooting:

the service used to connect between master pods elasticsearch-config-es-transport has an internal endpoints associated with the pods

Name:              elasticsearch-config-es-transport
Namespace:         test-logstash-1
Labels:            common.k8s.elastic.co/type=elasticsearch
                   elasticsearch.k8s.elastic.co/cluster-name=elasticsearch-config
Annotations:       <none>
Selector:          common.k8s.elastic.co/type=elasticsearch,elasticsearch.k8s.elastic.co/cluster-name=elasticsearch-config
Type:              ClusterIP
IP:                None
Port:              <unset>  9300/TCP
TargetPort:        9300/TCP
Endpoints:         10.124.14.144:9300,10.124.15.227:9300,10.124.24.146:9300 + 5 more...
Session Affinity:  None
Events:            <none>

i am trying to nc from the pod to other pods and i am getting connection refused:

from pod 2 to pod 1 i.e. master2 to master 1
nc -vz  10.124.14.144 9300
connection refused

i am wondering how networking is affected as these pods are in same namespace and are in same network, i am able to ping though.

some more troubleshooting:

internal IPs of the master pods :
kk8soptr@jumpbox:~$ kubectl get pods -l master=node -n test-logstash-1 -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
10.124.15.227
10.124.6.67
10.124.3.180
10.124.24.146
10.124.14.144
testing from within the pods :

   [root@logstash-0 logstash]#  for ep in 10.124.15.227:9300 10.124.6.67:9300 10.124.3.180:9300 10.124.24.146:9300 10.124.14.144:9300; do
   >     wget -qO- $ep
   > done
   no response
   [root@elasticsearch-config-es-master-0 elasticsearch]#  for ep in 10.124.15.227:9300 10.124.6.67:9300 10.124.3.180:9300 10.124.24.146:9300 10.124.14.144:9300; do     wget -
    qO- $ep; done
  no response

Thanks in advance

aman26ps · June 22, 2020, 9:35am

I am looking more into it and seems like a GKE cluster issue, i am able to spin up master nodes in one node using nodeAffinity comfortably but when i am not using node affinity the pods are not able to communicate on port 9300 as they are spinning up in other nodes of the node pool, seems like there is some firewall which is blocking the connection between nodes in the same node pool

naisanza · September 18, 2020, 8:30pm

Having the same problem

cat <<EOF | kubectl apply -f -
    apiVersion: elasticsearch.k8s.elastic.co/v1
    kind: Elasticsearch
    metadata:
      name: quickstart
    spec:
      version: 7.9.1
      nodeSets:
      - name: default
        count: 3
        config:
          node.master: true
          node.data: true
          node.ingest: true
          node.store.allow_mmap: false
    EOF

aman26ps · September 22, 2020, 6:06am

hey @naisanza My issue was related to GKE firewall, when i fixed that it worked for me, i had somehow diabled 9300 port among nodes because it which pods were unable to communicate using transport port

Topic		Replies	Views
Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes Elasticsearch	3	1013	August 5, 2019
Master not discovered yet, this node has not previously joined a bootstrapped Elasticsearch	2	379	April 29, 2023
Master not discovered yet, this node has not previously joined a bootstrapped Elasticsearch	16	14607	December 13, 2021
Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node Elasticsearch	15	14910	May 24, 2019
Elastic Master Nodes not discovered exception on ECK Elastic Cloud on Kubernetes (ECK)	4	1451	June 27, 2022

ECK - "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster

Related topics