Losing finishing lines of a terminating / crashing container


We are trying to setup Elastic Cloud on Kubernetes 1.4, using Filebeat 7.11.1 to harvest logs of all containers running on our Kubernetes cluster.

Since giving an annotation to all Pods we could potentially be interested in looked more difficult, we are trying to use autodiscovery. Everything works perfectly, except that Filebeat loses the last few lines before a Pod is terminated gracefully, or if a Pod crash.

It doesn't happen always, sometimes logs are correctly collected even on a full crash of the container and after some testing, I think it could be related to when the logs are written to the json-file, related to when the container actually terminates.
If logs are written too late and the container stops almost immediately after log lines are flushed to the json-file, then Filebeat loses those lines.

I've searched around and read that there are a few problems which are related to docker and Kubernetes events, together with autodiscovery but I couldn't find a proper solution to the problem.

Here is the Filebeat definition:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
  name: elasticsearch
  namespace: elastic-system
  type: filebeat
  version: 7.11.1
    name: elasticsearch
    name: elasticsearch
    setup.template.enabled: true
    setup.template.name: "filebeat"
    setup.template.overwrite: true
      _source.enabled: true
      index.number_of_shards: 5
      index.number_of_replicas: 2
        - type: kubernetes
          node: ${NODE_NAME}
          cleanup_timeout: 60
          hints.enabled: true
          # hints.default_config:
            # type: container
            # paths:
            # - /var/lib/docker/containers/${data.container.id}/*.log
            # - /var/log/pods/${data.kubernetes.pod.uid}/${data.kubernetes.container.name}/*.log
            # multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:'
            # multiline.negate: false
            # multiline.match: after
            # json.message_key: log
    - add_host_metadata:
        netinfo.enabled: true
    - add_kubernetes_metadata:
        in_cluster: true
    - add_process_metadata:
        match_pids: [system.process.ppid]
        target: system.process.parent
    - drop_event:
            - equals: # ignore itself
                kubernetes.container.name: "filebeat"
            - equals: # ignore metallb objects
                kubernetes.namespace: "metallb-system"
            - equals: # ignore argocd objects
                kubernetes.namespace: "argocd"
            - equals: # ignore lens-metrics objects
                kubernetes.namespace: "lens-metrics"
            - equals: # ignore Percona haproxy
                kubernetes.container.name: "haproxy"
            - equals: # ignore Rook-Ceph csi-snapshotter
                kubernetes.container.name: "csi-snapshotter"
            - equals: # ignore Kubernetes coredns
                kubernetes.container.name: "coredns"
            - equals: # ignore Vasco simulator
                kubernetes.container.name: "amis-vasco-simulator"
            - regexp: # ignore debug or trace logs
                message: "(DBG|DEBUG|TRACE|debug|trace)"
            - regexp: # ignore empty lines
                message: "^$"
            - regexp: # ignore debug or trace logs
                message: "<(Trace|Debug)>"
            - contains: # ignore probes
                message: "Health check succeeded"
            - contains: # ignore probes
                message: "kube-probe"
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: false # A true value would allow to provide richer host metadata
        - name: filebeat
            runAsUser: 0
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
            readOnly: true
          - name: varlog
            mountPath: /var/log
            readOnly: true
          # persisted path on container. This is expected to be persisted between runs
          - name: data
            mountPath: /usr/share/filebeat/data
            - name: NODE_NAME
                  fieldPath: spec.nodeName
              cpu: 200m
              memory: 400Mi
              cpu: 100m
              memory: 200Mi
        - name: varlibdockercontainers
            path: /var/lib/docker/containers
        - name: varlog
            path: /var/log
        # persisted path on host will be mounted under persisted path of the container
        - name: data
            path: /var/lib/filebeat-data
            type: DirectoryOrCreate

I've tried various configuration around hints.default_config, but it wouldn't really change anything regarding this problem.

A thing I've noticed is that the json-file log for terminating containers is removed as soon as they are terminating, which I imagine being the main culript of why logs aren't being harvested for terminating containers.

For crashing containers instead, the json-file log remains (as it can be consulted with --previous on Kubernetes logs) but still, last lines of logs are not being harvested if written too late in the log file.

Am I configuring something wrong? Are there any tips regarding how things should be set-up in order to prevent this kind of problems?


It looks similar to [filebeat] Sometimes Pod logs are not collected by Filebeat · Issue #17396 · elastic/beats · GitHub which was fixed at Fix terminating pod autodiscover issue by ChrsMark · Pull Request #20084 · elastic/beats · GitHub (7.11 version includes this one). Can you you check the similar to the GH issue, if "the harvester is terminated after the pod has finished completely"?

Also I wonder if your targeted Pods are in a state which is unhandled by Autodiscovery (similar to what was happening in the other issue).

Last but not least, it would be nice if you could provide a replicate scenario similar to what was provided in the aforementioned GH issue so as to try to reproduce and debug it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.