Kubernetes autodiscover fails if filebeat cannot determine its node name


We are (rarely) seeing an issue using the k8s autodiscover feature in filebeat 7.9.1.

If the API server is unavailable when filebeat tries to determine which node it's running on, filebeat might report an error like the following:

{"level":"error","timestamp":"2021-01-05T09:10:12.114Z","logger":"autodiscover.pod","caller":"kubernetes/util.go:117","message":"kubernetes: Querying for pod failed with error: Get \"{API_SERVER}/api/v1/namespaces/monitoring/pods/filebeat-ds-j6tfs\": dial tcp: i/o timeout"}

When this happens, no pod events seem to be detected and no logs are shipped.

From the code, it looks like the DiscoverKubernetesNode() function will return localhost in this error case, which is then used as the watch filter for pod/node events.

Is it possible to either retry or fail to start up entirely on a node discovery failure?



There is something similar that was coved for add_kubernetes_metadata processor initialisation at https://github.com/elastic/beats/pull/16373. Feel free to create a Github issue for this case and provide information for your problem so as the team to evaluate this feature.


Thanks for the quick response Chris!
I just created https://github.com/elastic/beats/issues/23400