Filebeat not starting log aggregation on some kubernetes worker nodes

Hi All,

I've run into an issue with filebeat in kubernetes, but only on some worker nodes. If useful, here's a bit of background

-Deployed using helm, via terraform
-currently 16 worker nodes, 7 of which filebeat refuses to process logs.
-filebeat version 7.10.2
-Running in AKS

Looking through the logs I'm just not able to tell why some filebeat pods work and some don't.

I've created a gist with:

  • Logs from a good pod (DEBUG enabled)
  • Logs from a bad pod (DEBUG enabled)
  • filebeat.yaml used to deploy across all the nodes

Things I've tried:

  • Destroying and recreating the EFK stack and filebeat
  • Wiping the beats-data persistence storage location on worker nodes for both working and non-working filebeat pods
  • Rebooting
  • Verified logs are mounted and readable from within the pods

This will sound crazy but the only common thread I can find is that the filebeat pods that are not working are all on worker nodes whose name ends in a letter. I think it's coincidental but thought I'd mention it.

Thanks in advance for any help. I do appreciate it.