Hello there,
I've already read a lot of docs and issues but haven't found any solution for my problem, so hopefully someone can help...
Our implementation is pretty simple - filebeat is set as daemonset for kubernetes cluster which means that each node has the filebeat pod which gets logs of the containers with specific label and forwards them to elastic
The configuration of filebeat is as the following
setup.ilm:
enabled: false
setup.template:
enabled: false
name: "%{[kubernetes][labels][elastic-index]}"
pattern: "%{[kubernetes][labels][elastic-index]}-*"
processors:
- decode_json_fields:
fields: ["message"]
process_array: true
max_depth: 10
target: ""
overwrite_keys: false
add_error_key: false
- add_cloud_metadata: ~
filebeat.inputs:
- type: container
paths:
- '/var/lib/docker/containers/*/*.log'
processors:
- add_kubernetes_metadata:
in_cluster: true
- drop_event:
when:
not:
regexp:
kubernetes.labels.elastic-index: ".*"
The problem we have is that when filebeat pod get restarted(OOM for example), it goes into some "crazy mode" - it updates registry file every second
4167578795.json active.dat log.json meta.json
sh-4.2# ls
4167601926.json active.dat log.json meta.json
sh-4.2# ls
4167625055.json active.dat log.json meta.json
sh-4.2# ls
4167648187.json active.dat log.json meta.json
sh-4.2# ls
4167648187.json active.dat log.json meta.json
and that file contains thousands of lines
sh-4.2# cat 4170146226.json | wc
2705 2706 1147578
but only for few log files
cat 4170146226.json | jq -r '.[].source' | sort -u | wc
60 60 9960
This is the output from the node of the same k8s cluster but in "right" mode
4675893.json active.dat log.json meta.json
sh-4.2# ls
4675893.json active.dat log.json meta.json
sh-4.2# ls
4675893.json active.dat log.json meta.json
sh-4.2# ls
4675893.json active.dat log.json meta.json
sh-4.2# ls
4675893.json active.dat log.json meta.json
sh-4.2# cat 4675893.json | wc
35 36 14383
And for sure the logs from the node with this "crazy" filebeat pod are lost.
We're using
image: docker.elastic.co/beats/filebeat
imageTag: 7.10.1
Is there any idea how to debug it? (besides increasing the memory limits for the pod to prevent it from getting killed)