Filebeat 7.10 fails to collect events from multiple kubernetes pods

bhavaniprasad_reddy · March 16, 2021, 6:00pm

Filebeat is configured to collect events from multiple kubernetes pods using or condition. Events from a specific pod are continuously collected but events from another pod are collected very slowly and no events are collected after sometime.

Commenting all other pods leaving a single one in the configuration works well and updates the events in the elasticsearch quickly.

There are 3 worker nodes on which filebeat (v7.10.2) is running as a daemonset. Each filebeat has cpu limits of 4 core and memory limits of 4 Gb. There will be one index generated per day and the size of index does not exceed more than 2Gb.

I want the filebeat to collect events from all the pods and update elasticsearch within no time. Please help me in understanding the issue and the best practices to improve filebeat performance.

filebeat.yml -

  filebeat.autodiscover:
    providers:
      - type: kubernetes
        node: ${NODE_NAME}
        tags:
          - "kube-logs"
        templates:
          - condition.or:
              - contains:
                  kubernetes.pod.name: "ne-db-manager"
              - contains:
                  kubernetes.pod.name: "ne-mgmt"
              - contains:
                  kubernetes.pod.name: "list-manager"
              - contains:
                  kubernetes.pod.name: "scheduler-mgmt"
              - contains:
                  kubernetes.pod.name: "sync-ne"
              - contains:
                  kubernetes.pod.name: "file-manager"
              - contains:
                  kubernetes.pod.name: "dash-board"
              - contains:
                  kubernetes.pod.name: "config-manager"
              - contains:
                  kubernetes.pod.name: "report-manager"
              - contains:
                  kubernetes.pod.name: "clean-backup"
              - contains:
                  kubernetes.pod.name: "warrior"
              - contains:
                  kubernetes.pod.name: "ne-backup"
              - contains:
                  kubernetes.pod.name: "ne-restore"
            config:
              - type: container
                paths:
                  - "/var/log/containers/*-${data.kubernetes.container.id}.log"
                multiline.type: pattern
                multiline.pattern: '^[[:space:]]'
                multiline.negate: false
                multiline.match: after
  logging.level: debug
  processors:
    - drop_event:
        when.or:
           - equals:
               kubernetes.namespace: "kube-system"
           - equals:
               kubernetes.namespace: "default"
           - equals:
               kubernetes.namespace: "logging"
  output.logstash:
    hosts: ["logstash-service.logging:5044"]
    index: filebeat
    pretty: true
  setup.template.name: "filebeat"
  setup.template.pattern: "filebeat-*"

BenB196 · March 17, 2021, 10:55pm

I'm not very familiar with Kubernetes or how its discovery works/is implemented in Filebeat, but one thing you may want to try. Instead of using a condition.or with a bunch of contains clauses. Have you tried giving all the pods a singular label (ie: logging: filebeat), then selecting everything with that label? A potential issue you might be running into, is that all your contain clauses could be slowing down discovery, and leveraging a singular label as a selector for all of the pods might help. Both by only having to search for a singular match, as well as looking for an exact match rather than a contains match.

BenB196 · March 18, 2021, 6:47pm

I would recommend you enabling metrics collection on these Filebeat nodes. It will provide some useful insight to see where you're running into a limitation.

On a somewhat related note, you mention only having 3 worker nodes, but then you say you have at least 600 pods for ne-backup. This would put you at a minimum of 200 pods per node which is double the recommended 100 pods per node that Kubernetes is designed for. You may be running into some sort of Kubernetes constraint.

bhavaniprasad_reddy · March 19, 2021, 5:10am

Actually I am using OKD-3.11 for deploying my applications and Openshift support 200 pods per node.
Also grafana is already present in my cluster and I do not see any resource outages.

CPU Memory limits -

NOTE: I see that filebeat is harvesting the 'ne-db-manager' logs but unable to collect events from it. It is able to collect events from 'ne-backup' pods that creates a large number of logfiles.

BenB196 · March 19, 2021, 2:24pm

Apologies, I wasn't clear on my statement regarding monitoring of Filebeat. I meant collecting the metrics that Filebeat itself exposes via the metrics option. This includes far more information regarding events processed, queued, etc. You might not be hitting a CPU/Memory limit, but you could be hitting some sort of other limit within your environment.

bhavaniprasad_reddy · March 19, 2021, 7:02pm

Please find the metrics of filebeat here - filebeat_metrics.log - 0154ccbd
Also check a portion of the log here - filebeat.log - 33312269
Let me know if you need any more details

BenB196 · March 20, 2021, 1:31pm

After looking over the metrics you provided I'm not seeing anything too offending. I'm not sure I can be of much help from here as I'm at my current knowledge of how Kubernetes and Filebeat work. The only I could possibly recommend is continuing to gather the metrics for the Filebeat agents either via Elasticsearch monitoring or Prometheus monitoring, and hope that either it provides some useful information or someone else comes across this topic and is able to provide more help.

system · April 17, 2021, 3:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat 7.12 is collecting the events very slowly Beats filebeat	3	705	June 8, 2021
Filebeat is able to collect events of a pod when deployed in a node but unable to collect events when the same pod is deployed in a different node Beats filebeat	1	205	July 3, 2021
Filebeat 7.12 event collection rate drops after a period of time Beats filebeat	2	289	May 23, 2021
Filebeat is partially collecting logs Beats filebeat	3	817	May 9, 2019
Filebeat daemonset in Kubernetes is slow (or fails) to harvest logs from multiple pods Beats filebeat	3	1037	February 7, 2023

Filebeat 7.10 fails to collect events from multiple kubernetes pods

Related topics