We have an established Filebeat ==(lumberjack)==> Logstash => Elasticsearch deployment for Docker container logs coming from a bespoke orchestration platform. We are now replacing our bespoke platform with Kubernetes.
We have Filebeat configured to run one instance per cluster-node to harvest every Kubernetes container log (from
/var/log/containers/) where the log files are named by Kubernetes convention based on the Kubernetes Namespace, Pod, and container names. Filebeat is configured very similarly to the Canonical Kubernetes distribution.
When the logs arrive at Logstash we can grok the Namespace, Pod, and containers names from the filename field but we do not have any of the Kubernetes annotations/tags/metadata.
Another popular distribution of Kubernetes uses Fluentd instead of Beats to ship logs. In this distribution, there is a Fluentd Kubernetes metadata plugin to gather the Kubernetes metadata to add to the logs before shipping. Sadly Fluentd does not appear to have any ability to send to logs to Logstash via the lumberjack protocol (although there are plugins to ship the other direction, Logstash to Fluentd) and Filebeat does not appear to have a hook to inject the same information.
At the moment our path forward seems to be to regenerate the Filebeat configuration file each time a new Pod is provisioned so we can statically specify all the log file names with their metadata as
fields and then restart Filebeat.
Do you have any better suggestions for shipping Kubernetes metadata with Filebeat? Are there any future plans to address this metadata issue for Kubernetes, and potentially other platforms?