Hello,
We are trying to setup Elastic Cloud on Kubernetes 1.4, using Filebeat 7.11.1 to harvest logs of all containers running on our Kubernetes cluster.
Since giving an annotation to all Pods we could potentially be interested in looked more difficult, we are trying to use autodiscovery. Everything works perfectly, except that Filebeat loses the last few lines before a Pod is terminated gracefully, or if a Pod crash.
It doesn't happen always, sometimes logs are correctly collected even on a full crash of the container and after some testing, I think it could be related to when the logs are written to the json-file
, related to when the container actually terminates.
If logs are written too late and the container stops almost immediately after log lines are flushed to the json-file, then Filebeat loses those lines.
I've searched around and read that there are a few problems which are related to docker and Kubernetes events, together with autodiscovery but I couldn't find a proper solution to the problem.
Here is the Filebeat definition:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: elasticsearch
namespace: elastic-system
spec:
type: filebeat
version: 7.11.1
elasticsearchRef:
name: elasticsearch
kibanaRef:
name: elasticsearch
config:
setup.template.enabled: true
setup.template.name: "filebeat"
setup.template.overwrite: true
setup.template.settings:
_source.enabled: true
index.number_of_shards: 5
index.number_of_replicas: 2
filebeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
cleanup_timeout: 60
hints.enabled: true
# hints.default_config:
# type: container
# paths:
# - /var/lib/docker/containers/${data.container.id}/*.log
# - /var/log/pods/${data.kubernetes.pod.uid}/${data.kubernetes.container.name}/*.log
# multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:'
# multiline.negate: false
# multiline.match: after
# json.message_key: log
processors:
- add_host_metadata:
netinfo.enabled: true
- add_kubernetes_metadata:
in_cluster: true
- add_process_metadata:
match_pids: [system.process.ppid]
target: system.process.parent
- drop_event:
when:
or:
- equals: # ignore itself
kubernetes.container.name: "filebeat"
- equals: # ignore metallb objects
kubernetes.namespace: "metallb-system"
- equals: # ignore argocd objects
kubernetes.namespace: "argocd"
- equals: # ignore lens-metrics objects
kubernetes.namespace: "lens-metrics"
- equals: # ignore Percona haproxy
kubernetes.container.name: "haproxy"
- equals: # ignore Rook-Ceph csi-snapshotter
kubernetes.container.name: "csi-snapshotter"
- equals: # ignore Kubernetes coredns
kubernetes.container.name: "coredns"
- equals: # ignore Vasco simulator
kubernetes.container.name: "amis-vasco-simulator"
- regexp: # ignore debug or trace logs
message: "(DBG|DEBUG|TRACE|debug|trace)"
- regexp: # ignore empty lines
message: "^$"
- regexp: # ignore debug or trace logs
message: "<(Trace|Debug)>"
- contains: # ignore probes
message: "Health check succeeded"
- contains: # ignore probes
message: "kube-probe"
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: false # A true value would allow to provide richer host metadata
containers:
- name: filebeat
securityContext:
runAsUser: 0
volumeMounts:
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
# persisted path on container. This is expected to be persisted between runs
- name: data
mountPath: /usr/share/filebeat/data
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
limits:
cpu: 200m
memory: 400Mi
requests:
cpu: 100m
memory: 200Mi
volumes:
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
# persisted path on host will be mounted under persisted path of the container
- name: data
hostPath:
path: /var/lib/filebeat-data
type: DirectoryOrCreate
I've tried various configuration around hints.default_config
, but it wouldn't really change anything regarding this problem.
A thing I've noticed is that the json-file log for terminating containers is removed as soon as they are terminating, which I imagine being the main culript of why logs aren't being harvested for terminating containers.
For crashing containers instead, the json-file log remains (as it can be consulted with --previous
on Kubernetes logs) but still, last lines of logs are not being harvested if written too late in the log file.
Am I configuring something wrong? Are there any tips regarding how things should be set-up in order to prevent this kind of problems?