Master not discovery exception

bikkina_mahesh · July 1, 2019, 2:28pm

Using elastic search 7.1.0

My eck crd configuration :
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
labels:
controller-tools.k8s.io: "1.0"
name: moss-es-cluster
spec:
version: "7.1.0"
nodes:

config:
node.master: true
node.data: true
node.ingest: true
podTemplate:
metadata:
labels:
app: moss-es-node
spec:
containers:
- name: elasticsearch
resources:
limits:
memory: 4Gi
cpu: 1
nodeCount: 3
this shows how to request 2Gi of persistent data storage for pods in this topology element
volumeClaimTemplates:
- metadata:
  name: data
  spec:
  accessModes:
  - ReadWriteOnce
    resources:
    requests:
    storage: 50Gi
    storageClassName: rook-block

I am getting master not discovered exception and cluster is red state.

logs of pods:

ter-es-phkl755hgg}{z_DYKZ60Sl-3miLEm7oiuA}{p0-DzHkHRjS66M_APpnXXw}{10.2.1.23}{10.2.1.23:9300}{ml.machine_memory=12884901888, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
{"type": "server", "timestamp": "2019-06-28T06:04:15,345+0000", "level": "DEBUG", "component": "o.e.a.a.c.s.TransportClusterUpdateSettingsAction", "cluster.name": "moss-es-cluster", "node.name": "moss-es-cluster-es-phkl755hgg", "message": "timed out while retrying [cluster:admin/settings/update] after failure (timeout [30s])" }
{"type": "server", "timestamp": "2019-06-28T06:04:15,345+0000", "level": "WARN", "component": "r.suppressed", "cluster.name": "moss-es-cluster", "node.name": "moss-es-cluster-es-phkl755hgg", "message": "path: /_cluster/settings, params: {}" ,
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:555) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.1.0.jar:7.1.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",

bikkina_mahesh · July 1, 2019, 5:05pm

Can anyone please help.? This happening when i attach pvc(rook). without pvc, using local file system cluster is able to elect the master

bikkina_mahesh · July 9, 2019, 5:25am

One more observation: if i bring one master up and then adding remaining 4 masters then cluster is up and running fine. If i start with initial 5 master nodes, cluster is not coming up by giving master not found exception. Any idea???

sebgl · July 9, 2019, 8:21am

Hi @bikkina_mahesh
This could be related to https://github.com/elastic/cloud-on-k8s/issues/1201: we have a bug with setting initial master nodes on cluster whose pods are restarting.

Can you provide more details about your setup? We created a script to help debugging common scenarios.

It outputs informations about the Elastic k8s resources in a few text files. Including:

getting the operator logs
getting Elasticsearch logs
getting the ES resource
getting the list of pods
getting the lis of secrets (without their content)
etc.

Could you delete your cluster, create it with the 5 master nodes you mentioned, then run this script against your cluster and post the archive file in this issue?

./eck-dump.sh --output-directory eck_dump --create-zip

bikkina_mahesh · July 9, 2019, 4:58pm

one doubt, how to upload files in issue page. i could not find any option here to upload zip files

bikkina_mahesh · July 10, 2019, 5:29am

You can access dump from this link https://github.com/bikkinamahesh369/eck_dump

sebgl · July 10, 2019, 7:52am

Thanks @bikkina_mahesh.
Based on the Elasticsearch logs, I think this is definitely related to https://github.com/elastic/cloud-on-k8s/issues/1201.
This can happen when reusing existing persistent volumes, or modifying an existing cluster spec before the cluster is formed.
I think if you delete your cluster, and also delete all existing PersistentVolumeClaims and PersistentVolumes, then recreate your cluster, you should not have this problem.
Definitely something we need to fix in upcoming releases.

bikkina_mahesh · July 10, 2019, 9:45am

Thanks for your help. will try.

Topic		Replies	Views
Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes Elasticsearch	3	1044	August 5, 2019
Master not discovered exception with ELK 7 Elasticsearch	9	5517	August 4, 2019
Elasticsearch master pods failing master not discovered or elected yet, an election requires at least 2 nodes Community Ecosystem	7	261	September 24, 2024
Getting “master not discovered or elected yet” causing cluster not up in version 7.9.1 Elasticsearch	21	4199	November 7, 2020
Elastic Master Nodes not discovered exception on ECK Elastic Cloud on Kubernetes (ECK)	4	1553	June 27, 2022

Master not discovery exception

this shows how to request 2Gi of persistent data storage for pods in this topology element

Related topics