Hi,
We are (rarely) seeing an issue using the k8s autodiscover feature in filebeat 7.9.1.
If the API server is unavailable when filebeat tries to determine which node it's running on, filebeat might report an error like the following:
{"level":"error","timestamp":"2021-01-05T09:10:12.114Z","logger":"autodiscover.pod","caller":"kubernetes/util.go:117","message":"kubernetes: Querying for pod failed with error: Get \"{API_SERVER}/api/v1/namespaces/monitoring/pods/filebeat-ds-j6tfs\": dial tcp: i/o timeout"}
When this happens, no pod events seem to be detected and no logs are shipped.
From the code, it looks like the DiscoverKubernetesNode()
function will return localhost
in this error case, which is then used as the watch filter for pod/node events.
Is it possible to either retry or fail to start up entirely on a node discovery failure?
Cheers,
Scott