I have deployed filebeat 7.12 in my cluster to collect events from kubernetes logs using autodiscover.
The events are being collected in NRT (Near Real Time) for all the pods that I wish but stopped collecting only for 2 to 3 pods after a period of time(1 or 2 hours).
There are around 260 pods out of which 240 pods are generated for every 10 to 15 minutes. Filebeat is harvesting the logs, collecting and sending the events successfully for those 240 pods and also for most of the other pods except for 2 to 3 pods.
This is the same behaviour when filebeat is sending the events either to logstash or to console directly. The missing events for those 2 to 3 pods are collected at the end when there are no more 240 pods being generated.
Updated the filebeat configuration for not to collect events from those 240 pods. This time the events are collected for all the pods in NRT.
Tried tweaking many parameters like max_procs, close_inactive, ignore_older, output.logstash.workers, output.logstash.bulk_max_size, queue.mem.events, queue.mem.flush.min_events and queue.mem.flush.timeout but none of them resolved the issue.
Resources Allocated -
RAM - 2 to 4 GB and
CPU - 2 to 4 cores
There are 4 filebeat pods running on each worker node.
CPU metrics -
Memory metrics -
Adding the filebeat configuration that I am using.
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
tags:
- "kube-logs"
templates:
- condition.or:
- contains:
kubernetes.pod.name: "ne-db-manager"
- contains:
kubernetes.pod.name: "ne-mgmt"
- contains:
kubernetes.pod.name: "list-manager"
- contains:
kubernetes.pod.name: "scheduler-mgmt"
- contains:
kubernetes.pod.name: "sync-ne"
- contains:
kubernetes.pod.name: "file-manager"
- contains:
kubernetes.pod.name: "dash-board"
- contains:
kubernetes.pod.name: "config-manager"
- contains:
kubernetes.pod.name: "report-manager"
- contains:
kubernetes.pod.name: "clean-backup"
- contains:
kubernetes.pod.name: "warrior"
- contains:
kubernetes.pod.name: "ne-ops" #This name will be found in 240 pods
config:
- type: container
paths:
- "/var/log/containers/*-${data.kubernetes.container.id}.log"
multiline.type: pattern
multiline.pattern: '^[[:space:]]'
multiline.negate: false
multiline.match: after
#scan_frequency: 1s
#close_inactive: 5m
#ignore_older: 10m
max_procs: 4
filebeat.shutdown_timeout: 5s
logging.level: debug
processors:
- drop_event:
when.or:
- equals:
kubernetes.namespace: "kube-system"
- equals:
kubernetes.namespace: "default"
- equals:
kubernetes.namespace: "logging"
processors:
- fingerprint:
fields: ["message"]
target_field: "@metadata._id"
output.logstash:
hosts: ["logstash-headless.logging:5044"]
#, "logstash-headless.logging:5045"]
#loadbalance: true
#workers: 16
index: filebeat
pretty: false
#bulk_max_size: 1600
#compression_level: 9
#queue.mem:
# events: 51200
# flush.min_events: 1600
# flush.timeout: 1s
setup.template.name: "filebeat"
setup.template.pattern: "filebeat-*"
I want all the events to be collected from all the pods in NRT. Any suggestions here would be appreciated.