Hi All,
I've run into an issue with filebeat in kubernetes, but only on some worker nodes. If useful, here's a bit of background
-Deployed using helm, via terraform
-currently 16 worker nodes, 7 of which filebeat refuses to process logs.
-filebeat version 7.10.2
-Running in AKS
Looking through the logs I'm just not able to tell why some filebeat pods work and some don't.
I've created a gist with:
- Logs from a good pod (DEBUG enabled)
- Logs from a bad pod (DEBUG enabled)
- filebeat.yaml used to deploy across all the nodes
Things I've tried:
- Destroying and recreating the EFK stack and filebeat
- Wiping the beats-data persistence storage location on worker nodes for both working and non-working filebeat pods
- Rebooting
- Verified logs are mounted and readable from within the pods
This will sound crazy but the only common thread I can find is that the filebeat pods that are not working are all on worker nodes whose name ends in a letter. I think it's coincidental but thought I'd mention it.
Thanks in advance for any help. I do appreciate it.