Master not discovery exception

Using elastic search 7.1.0

My eck crd configuration :
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
labels:
controller-tools.k8s.io: "1.0"
name: moss-es-cluster
spec:
version: "7.1.0"
nodes:

  • config:
    node.master: true
    node.data: true
    node.ingest: true
    podTemplate:
    metadata:
    labels:
    app: moss-es-node
    spec:
    containers:
    - name: elasticsearch
    resources:
    limits:
    memory: 4Gi
    cpu: 1
    nodeCount: 3

    this shows how to request 2Gi of persistent data storage for pods in this topology element

    volumeClaimTemplates:
    • metadata:
      name: data
      spec:
      accessModes:
      • ReadWriteOnce
        resources:
        requests:
        storage: 50Gi
        storageClassName: rook-block

I am getting master not discovered exception and cluster is red state.

logs of pods:

ter-es-phkl755hgg}{z_DYKZ60Sl-3miLEm7oiuA}{p0-DzHkHRjS66M_APpnXXw}{10.2.1.23}{10.2.1.23:9300}{ml.machine_memory=12884901888, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
{"type": "server", "timestamp": "2019-06-28T06:04:15,345+0000", "level": "DEBUG", "component": "o.e.a.a.c.s.TransportClusterUpdateSettingsAction", "cluster.name": "moss-es-cluster", "node.name": "moss-es-cluster-es-phkl755hgg", "message": "timed out while retrying [cluster:admin/settings/update] after failure (timeout [30s])" }
{"type": "server", "timestamp": "2019-06-28T06:04:15,345+0000", "level": "WARN", "component": "r.suppressed", "cluster.name": "moss-es-cluster", "node.name": "moss-es-cluster-es-phkl755hgg", "message": "path: /_cluster/settings, params: {}" ,
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:555) [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.1.0.jar:7.1.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",

Can anyone please help.? This happening when i attach pvc(rook). without pvc, using local file system cluster is able to elect the master

One more observation: if i bring one master up and then adding remaining 4 masters then cluster is up and running fine. If i start with initial 5 master nodes, cluster is not coming up by giving master not found exception. Any idea???

Hi @bikkina_mahesh
This could be related to https://github.com/elastic/cloud-on-k8s/issues/1201: we have a bug with setting initial master nodes on cluster whose pods are restarting.

Can you provide more details about your setup? We created a script to help debugging common scenarios.

It outputs informations about the Elastic k8s resources in a few text files. Including:

  • getting the operator logs
  • getting Elasticsearch logs
  • getting the ES resource
  • getting the list of pods
  • getting the lis of secrets (without their content)
  • etc.

Could you delete your cluster, create it with the 5 master nodes you mentioned, then run this script against your cluster and post the archive file in this issue?

./eck-dump.sh --output-directory eck_dump --create-zip

one doubt, how to upload files in issue page. i could not find any option here to upload zip files

You can access dump from this link https://github.com/bikkinamahesh369/eck_dump

Thanks @bikkina_mahesh.
Based on the Elasticsearch logs, I think this is definitely related to https://github.com/elastic/cloud-on-k8s/issues/1201.
This can happen when reusing existing persistent volumes, or modifying an existing cluster spec before the cluster is formed.
I think if you delete your cluster, and also delete all existing PersistentVolumeClaims and PersistentVolumes, then recreate your cluster, you should not have this problem.
Definitely something we need to fix in upcoming releases.

Thanks for your help. will try.