Filebeat 7.4.0 does not recover when it fails to connect with k8s API

I am using the filebeat elastic helm chart under an Istio service mesh.

As of filebeat 7.4.0 with the new k8s client filebeat starts faster than the Istio side car which blocks outbound requests to the k8s API. As a result filebeat never recovers the k8s connection and I lose all k8s meta data on my log packets.

│ 2019-10-17T20:33:29.733Z ERROR kubernetes/util.go:85 kubernetes: Querying for pod failed with error: Get dial tcp connect: connection refused │
│ E1017 20:33:29.734464 1 reflector.go:125] Failed to list *v1.Pod: Get
│ ceVersion=0: dial tcp connect: connection refused

I found a workaround by applying a sleep before starting filebeat, but I don't want it to be permanent.

A similar issue is describe in

Will there be a retry or exponential back off added in upcoming versions?

Hi @GreenKnight15,

let me try to reproduce / follow the code.
I'm not sure of if beats have a policy of retrying or failing, looks like it is mostly mandated by the library being used.

If you feel so, open an issue where that behaviour can be discussed while I guess out how it currently works.

Hi again @GreenKnight15 ,

there are 2 potential features affected by this issue

  • autodiscover
  • add kubernetes metadata
  • a watcher that is setup for kubernetes metricsets to enrich events

I think we can do something there, but we need the to get the product designers involved, can you please create a GH issue?


I wend ahead and opened a GH ticket

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.