Beats can't reach Elastic Service

I've been running my ECK (elastic cloud on kubernetes) cluster for a couple of weeks with no issues. However, 3 days ago filebeat stopped being able to connect to my ES service. All pods are up and running (Elastic, Beats and Kibana)

Also, shelling into filebeats pods and connecting to the Elasticsearch service works just fine:
curl -k -u "user:$PASSWORD" https://quickstart-es-http.quickstart.svc:9200

{
  "name" : "aegis-es-default-4",
  "cluster_name" : "quickstart",
  "cluster_uuid" : "",
  "version" : {
    "number" : "7.14.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "",
    "build_date" : "",
    "build_snapshot" : false,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Yet the filebeats pod logs are producing the below error:

ERROR	[publisher_pipeline_output]	pipeline/output.go:154	Failed to connect to backoff(elasticsearch(https://quickstart-es-http.quickstart.svc:9200)): Connection marked as failed because the onConnect callback failed: could not connect to a compatible version of Elasticsearch: 503 Service Unavailable: {"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

I haven't made any changes so I think it's a case of authentication or SSL certificates needing updating?

My filebeats config looks like this:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: quickstart
  namespace: quickstart
spec:
  type: filebeat
  version: 7.14.0
  elasticsearchRef:
    name: quickstart
  config:
    filebeat:
      modules:
        - module: gcp
          audit:
            enabled: true
            var.project_id: project_id
            var.topic: topic_name
            var.subcription: sub_name
            var.credentials_file: /usr/certs/credentials_file
            var.keep_original_message: false
          vpcflow:
            enabled: true
            var.project_id: project_id
            var.topic: topic_name
            var.subscription_name: sub_name
            var.credentials_file: /usr/certs/credentials_file
          firewall:
            enabled: true
            var.project_id: project_id
            var.topic: topic_name
            var.subscription_name: sub_name
            var.credentials_file: /usr/certs/credentials_file
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        securityContext:
          runAsUser: 0
        containers:
        - name: filebeat
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
          - name: credentials
            mountPath: /usr/certs
            readOnly: true
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: credentials
          secret:
            defaultMode: 420
            items:
            secretName: elastic-service-account

And it was working just fine - haven't made any changes to this config to make it lose access.

Any help or where to go next debugging would be appreciated.

Hey @_bugc4t, thanks for your question.

Can you take a look at Elasticsearch Pods logs? It should tell you why there is an issue with master node election. Do they have enough storage allocated to keep up with your needs? If there is nothing obvious there, you can use our diag tool and provide its output so I can investigate further.

Thanks,
David

Didn't have the storage - increased it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.