503 Service Unavailable - Master not discovered error in AKS

HI,
I have a AKS cluster (k1) with 3 nodes. I have deployed ECK operator using all_in_one.yaml.
I have another AKS cluster(k2) which is running my application stack , i have installed filebeats in this cluster (k2) and it is shipping the logs to elasticsearch to another k8s cluster (k1).
My k2 cluster configuration is 3 nodes (4 vCPU, 16 GiB RAM , 32GiB Temp Storage).
My elasticsearch.yaml:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: dev-elasticsearch
spec:
  version: 7.9.2
  http:
    # Disable https
    tls:
      selfSignedCertificate:
        disabled: true
    service:
      spec:
        type: LoadBalancer
  nodeSets:
  - name: default
    count: 3
    config:
      node.master: true
      node.data: true
      node.store.allow_mmap: false
      xpack.security.enabled: true
      discovery.seed_hosts:
         - 10.x.x.x
         - 10.x.x.x
         - 10.x.x.x

When i start the start es and kibana , everything works fine for about 30 mins and then i have service unavailable error in es
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

Another thing i noticed is elastic-internal-init-filesystem is in Terminated state.

  1. Can someone help me resolve this issue ?
  2. Also how can i write the logs to a az storage with more space instead on default PV ?

Thanks for your help !!

You should not set discovery.seed_hosts yourself. The ECK operator takes care of it for you, and maintains it up-to-date at all time.
Can you try removing it from the configuration? It will likely help with node discovery.

Hi @sebgl - Thanks , i will remove discovery.seed_hosts, however i am not seeing any issues now having that in the elasticsearch manifest. The workloads are running fine now in order to resolve my issue I had to put more space into the pods , there is around 20 GiB of data/per from my application kubernetes cluster and when i used the default (2 GiB) the space ran out almost in 20-30 mins and rendered Service unavailable.

 volumeClaimTemplates:
    - metadata: 
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 90Gi
        storageClassName: azurefile
        persistentVolumeReclaimPolicy: Delete

Big Thanks to this community - i have been able to make headway reading articles and issue resolution.
Onto my other question - how can i have rolling peristant volumes ? I mean i am using around 100 GiB of persistant volume claims but it will run out someday probably in a month so I want to have a mechanism that will archive my pod volume data and make that always available. Please let me know if there is a way to do that ?

it will run out someday probably in a month so I want to have a mechanism that will archive my pod volume data and make that always available

Sounds like you may be interested in using the cold and frozen tiers. The idea is basically that you can roll data after some time (or shard size), and back it up in a snapshot where it still says searchable.
See Data tiers | Elasticsearch Guide [master] | Elastic and Directly search S3 with the new frozen tier | Elastic Blog.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.