Single master node cluster dies when master node dies

I am using ECK to deploy an ES cluster. my setup has 1 master node and 3 data nodes. If for some reason master nodes dies it comes back up (thanks to statefulsets in kubernetes), but associates itself with a different cluster-ID. Thus it rejects the request of other data nodes who tries to join it.

but when I run an upgrade or something where kubernetes "safely" remove and bring back the master node cluster becomes healthy in a while.

I tried with 3 master nodes as well. but when I kill one node it is never able to join the existing cluster and the cluster goes to yellow state forever.

now my question is

what if some issue happens and the master node is not restarted safely how can I make sure my cluster formation is happening correctly.

Heres the yaml code for deploying ECK on my kubernetes cluster.

---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: sifter-elastic-data-factory
spec:
  version: 7.14.0
  nodeSets:
    - name: master
      count: 1
      config:
        node.master: true
        node.data: false
        node.ingest: false
      podTemplate:
        spec:
          initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
          containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 8Gi
                  cpu: 3000m
                limits:
                  memory: 8Gi
                  cpu: 3000m
              env:
                - name: ES_JAVA_OPTS
                  value: -Xms6g -Xmx6g
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 50Gi
            storageClassName: ssd
    - name: data
      count: 3
      config:
        node.master: false
        node.data: true
        node.ingest: true
      podTemplate:
        spec:
          initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
          containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 8Gi
                  cpu: 3000m
                limits:
                  memory: 8Gi
                  cpu: 3000m
              env:
                - name: ES_JAVA_OPTS
                  value: -Xms6g -Xmx6g
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 60Gi
            storageClassName: ssd
  http:
    service:
      spec:
        type: ClusterIP
    tls:
      selfSignedCertificate:
        disabled: true

Are you using persistent volumes for the master?

Yes. persistent volume for master as well as for data. could this be an issue? @warkolm

@warkolm I guess this was the issue. when I removed the persistent storage on the master node, the cluster always comes back up without losing data.

followup question, what is the downside of not using persistent storage with the master node?

Your masters need persistent storage - Node | Elasticsearch Guide [7.14] | Elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.