Filebeat 7.12 event collection rate drops after a period of time

I have deployed filebeat 7.12 on a kubernetes cluster for collecting log events. The deployment was smooth and started collecting events as expected.

But after a day, after collecting 50K + events the number of events that are being collected went down.

There used to be atmost 200 events difference between the kubernetes logs and the events stored in elasticsearch.

But after a day the difference went to 6K so by this difference I confirmed that filebeat is slowed down.

filebeat.yaml -

  filebeat.autodiscover:
    providers:
      - type: kubernetes
        node: ${NODE_NAME}
        tags:
          - "kube-logs"
        templates:
          - condition.or:
              - contains:
                  kubernetes.pod.name: "ne-db-manager"
              - contains:
                  kubernetes.pod.name: "ne-mgmt"
              - contains:
                  kubernetes.pod.name: "list-manager"
              - contains:
                  kubernetes.pod.name: "scheduler-mgmt"
              - contains:
                  kubernetes.pod.name: "sync-ne"
              - contains:
                  kubernetes.pod.name: "file-manager"
              - contains:
                  kubernetes.pod.name: "dash-board"
              - contains:
                  kubernetes.pod.name: "config-manager"
              - contains:
                  kubernetes.pod.name: "report-manager"
              - contains:
                  kubernetes.pod.name: "clean-backup"
              - contains:
                  kubernetes.pod.name: "warrior"
              - contains:
                  kubernetes.pod.name: "ne-ops"
            config:
              - type: container
                paths:
                  - "/var/log/containers/*-${data.kubernetes.container.id}.log"
                multiline.type: pattern
                multiline.pattern: '^[[:space:]]'
                multiline.negate: false
                multiline.match: after
                close_inactive: 2m
                ignore_older: 5m
  filebeat.shutdown_timeout: 5s
  logging.level: debug
  processors:
    - drop_event:
        when.or:
           - equals:
               kubernetes.namespace: "kube-system"
           - equals:
               kubernetes.namespace: "default"
           - equals:
               kubernetes.namespace: "logging"
  processors:
    - fingerprint:
        fields: ["message"]
        target_field: "@metadata._id"
  output.logstash:
    hosts: ["logstash-headless.logging:5044"]
    index: filebeat
    pretty: true
  setup.template.name: "filebeat"
  setup.template.pattern: "filebeat-*"

The resource available are pretty good, but not sure the reason behind this slowness. Can anyone please help me in understanding the cause with the resolution.

As with anything related to ingest slowness, which can be a large amount of different things, you should always start with the logs, at first you can start with filebeat and logstash, and if nothing you should also check ES.

Outside of error logging on filebeat, there is also metrics that is logged every 30 seconds, these metrics will include things like how many events are in queue, if this queue is always increasing then that means that logstash or ES is not able to keep up with everything, and if there is any major errors anywhere it should become quite visible in the logs.

In almost all cases, the output of the logs will either point you directly to the issue (and therefore hopefully a solution), or at least give good hints as to what might be the issue.

Look for entries marked ERROR, WARNING/WARN, or look for words like "backpressure", or http responses in the 400-500 range etc, they should hopefully all be logged.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.