As the title suggests, I believe when using pod UID (/var/log/pod) with the below config which was fixed and backported into release v7.17.1, it still cannot get the metadata for the second file when rotated in AKS by kubelet. Not sure if this affects other managed kubernetes instances as we only use AKS.
I believe the issue is here, as when AKS rotates the file it adds an numbers onto the end i.e. "*.log.12451251" meaning filebeat then skips from line 101 down to 135 meaning no poduid is extracted therefore cannot be matched. Is there a reason this isn't the below instead so it would pickup both scenarios?
if strings.Contains(source, ".log")
Error from filebeat debug logs:
2022-03-07T07:06:07.991Z DEBUG [kubernetes] add_kubernetes_metadata/kubernetes.go:278 Index key debrief-test_debrief-eventhub-consumer-dws-12ffasfgag44-w9cvs_da5d did not match any of the cached resources {"libbeat.processor": "add_kubernetes_metadata"}
My belief is the ones that are getting this error is down to it not being able to see the rotated files poduid due to it ending in ".log.124312" rather than simply ".log".
In the meantime I am going to try the autodiscover configuration to see if that helps matters as I believe this discovers kubernetes metadata differently.
I probably should of mentioned, I also had an enterprise issue open and we are losing thousands of logs per minute for a certain application that is logging a lot. When checking metrics we noticed truncated files we're high amounts which I believed were caused by big amounts of data data being logged in peak load as it's fine overnight. Elastic support agreed we should either not rotate (Managed AKS makes this pretty hard to change), or try to scan both files at least. In light of these we had to change to /var/log/pods instead of /var/log/containers which was a syslink to a single file (didn't include rotated). But yes this latest challenge means kubernetes metadata only works for one file.
I also switched to the below config in the meantime using auto discover and it appears to have resolved the issue too, however the PR may prevent somebody else in future tripping over this.
I only applied it to prod recently so will monitor it tomorrow to be sure.
I also noticed the PODUID wasn't extracted but I assumed that was because of that part of the code not running due to it not ending .log therefore not running the next steps to abstract it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.