Master not discovery exception

Using elastic search 7.1.0

My eck crd configuration :
kind: Elasticsearch
labels: "1.0"
name: moss-es-cluster
version: "7.1.0"

  • config:
    node.master: true true
    node.ingest: true
    app: moss-es-node
    - name: elasticsearch
    memory: 4Gi
    cpu: 1
    nodeCount: 3

    this shows how to request 2Gi of persistent data storage for pods in this topology element

    • metadata:
      name: data
      • ReadWriteOnce
        storage: 50Gi
        storageClassName: rook-block

I am getting master not discovered exception and cluster is red state.

logs of pods:

ter-es-phkl755hgg}{z_DYKZ60Sl-3miLEm7oiuA}{p0-DzHkHRjS66M_APpnXXw}{}{}{ml.machine_memory=12884901888, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
{"type": "server", "timestamp": "2019-06-28T06:04:15,345+0000", "level": "DEBUG", "component": "o.e.a.a.c.s.TransportClusterUpdateSettingsAction", "": "moss-es-cluster", "": "moss-es-cluster-es-phkl755hgg", "message": "timed out while retrying [cluster:admin/settings/update] after failure (timeout [30s])" }
{"type": "server", "timestamp": "2019-06-28T06:04:15,345+0000", "level": "WARN", "component": "r.suppressed", "": "moss-es-cluster", "": "moss-es-cluster-es-phkl755hgg", "message": "path: /_cluster/settings, params: {}" ,
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at$AsyncSingleAction$4.onTimeout( [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout( [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout( [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.cluster.service.ClusterApplierService$ [elasticsearch-7.1.0.jar:7.1.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ [elasticsearch-7.1.0.jar:7.1.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$ [?:?]",

Can anyone please help.? This happening when i attach pvc(rook). without pvc, using local file system cluster is able to elect the master

One more observation: if i bring one master up and then adding remaining 4 masters then cluster is up and running fine. If i start with initial 5 master nodes, cluster is not coming up by giving master not found exception. Any idea???

Hi @bikkina_mahesh
This could be related to we have a bug with setting initial master nodes on cluster whose pods are restarting.

Can you provide more details about your setup? We created a script to help debugging common scenarios.

It outputs informations about the Elastic k8s resources in a few text files. Including:

  • getting the operator logs
  • getting Elasticsearch logs
  • getting the ES resource
  • getting the list of pods
  • getting the lis of secrets (without their content)
  • etc.

Could you delete your cluster, create it with the 5 master nodes you mentioned, then run this script against your cluster and post the archive file in this issue?

./ --output-directory eck_dump --create-zip

one doubt, how to upload files in issue page. i could not find any option here to upload zip files

You can access dump from this link

Thanks @bikkina_mahesh.
Based on the Elasticsearch logs, I think this is definitely related to
This can happen when reusing existing persistent volumes, or modifying an existing cluster spec before the cluster is formed.
I think if you delete your cluster, and also delete all existing PersistentVolumeClaims and PersistentVolumes, then recreate your cluster, you should not have this problem.
Definitely something we need to fix in upcoming releases.

1 Like

Thanks for your help. will try.