Hi everyone,
I would like to understand if it would be possible to enhance the current add_kubernetes_metadata
mechanism so that it could fetch the metadata starting from a pod-uid
instead.
In our current scenario we have a Kubernetes cluster with several pods on different nodes, using Filebeat to stream the logs to an Elasticsearch host. The applications write to a few different file logs and, because of this and some other reasons, we are not simply using the stdout and stderr outputs for Filebeat. Instead, we have created and mounted volumes on the host file system and mounted these volumes on the Filebeat pod so it can read and send them.
This works fine except for the fact that all events appear as originating from the Filebeat pod and we have lost all the Kubernetes metadata that normally gets appended.
The pods log volumes are mounted in
/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~empty-dir/<volume-name>
The filebeat configuration
apiVersion: v1
data:
filebeat.yml: |-
filebeat.config:
prospectors:
# Mounted `filebeat-prospectors` configmap:
path: ${path.config}/prospectors.d/*.yml
# Reload prospectors configs as they change:
reload.enabled: false
modules:
path: ${path.config}/modules.d/*.yml
# Reload module configs as they change:
reload.enabled: false
processors:
- add_cloud_metadata:
- add_kubernetes_metadata:
in_cluster: true
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST}:${ELASTICSEARCH_PORT}']
username: ${ELASTICSEARCH_USERNAME}
password: ${ELASTICSEARCH_PASSWORD}
index: "filebeat-%{[beat.version]}-logs-%{+xxxx.ww}"
setup.template.name: "filebeat-%{[beat.version]}"
setup.template.pattern: "filebeat-%{[beat.version]}-*"
kind: ConfigMap
apiVersion: v1
data:
kubernetes.yml: |-
- type: log
paths:
- /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/applogs/*.log
exclude_files: ['\.gz$', 'gc.log']
kind: ConfigMap
A sample event has the following source
/var/lib/kubelet/pods/005f3b90-4b9d-12f8-acf0-31020a840133/volumes/kubernetes.io~empty-dir/applogs/server.log
The current code tries to extract the container id either from /var/lib/docker/container
or from /var/log/containers
, so unless I am neglecting some configuration it won't work.
Perhaps there are a few ways to enhance the logic to help in such a situation:
-
Use the
pod-uid
to find out the related/var/lib/docker/container
path and hook this logic before Matcher kicks in. -
Create a second type of Matcher and use the configuration file to specify options and paths for it
-
Modify the
add_kubernetes_metadata
logic to work with apod-uid
too
All options would need to take into consideration the possibility of multiple containers per pod (perhaps by mounting the log volumes in different subpaths).
I'd be more than happy to work on this and create a PR, but first want to understand the different points of view and the recommendations, if any.
Cheers!