I've been running my ECK (elastic cloud on kubernetes) cluster for a couple of weeks with no issues. However, 3 days ago filebeat stopped being able to connect to my ES service. All pods are up and running (Elastic, Beats and Kibana)
Also, shelling into filebeats pods and connecting to the Elasticsearch service works just fine:
curl -k -u "user:$PASSWORD" https://quickstart-es-http.quickstart.svc:9200
{
"name" : "aegis-es-default-4",
"cluster_name" : "quickstart",
"cluster_uuid" : "",
"version" : {
"number" : "7.14.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "",
"build_date" : "",
"build_snapshot" : false,
"lucene_version" : "8.9.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Yet the filebeats pod logs are producing the below error:
ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(elasticsearch(https://quickstart-es-http.quickstart.svc:9200)): Connection marked as failed because the onConnect callback failed: could not connect to a compatible version of Elasticsearch: 503 Service Unavailable: {"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
I haven't made any changes so I think it's a case of authentication or SSL certificates needing updating?
My filebeats config looks like this:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: quickstart
namespace: quickstart
spec:
type: filebeat
version: 7.14.0
elasticsearchRef:
name: quickstart
config:
filebeat:
modules:
- module: gcp
audit:
enabled: true
var.project_id: project_id
var.topic: topic_name
var.subcription: sub_name
var.credentials_file: /usr/certs/credentials_file
var.keep_original_message: false
vpcflow:
enabled: true
var.project_id: project_id
var.topic: topic_name
var.subscription_name: sub_name
var.credentials_file: /usr/certs/credentials_file
firewall:
enabled: true
var.project_id: project_id
var.topic: topic_name
var.subscription_name: sub_name
var.credentials_file: /usr/certs/credentials_file
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext:
runAsUser: 0
containers:
- name: filebeat
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
- name: credentials
mountPath: /usr/certs
readOnly: true
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: credentials
secret:
defaultMode: 420
items:
secretName: elastic-service-account
And it was working just fine - haven't made any changes to this config to make it lose access.
Any help or where to go next debugging would be appreciated.