We are running a logging load test on our filebeat configuration and have observed that it is reliably dropping a small percentage of lines from the head of the first files it is seeing.
Filebeat is deployed to a 3 node GKE cluster as a DaemonSet. GKE is rotating our files every 10MB.
Our log load test is setup as follows (config at the end of the post):
- Deployed as a single Kubernetes job run on each node for a total of 3 jobs.
- Each job has 10 containers, which creates 10 log files for filebeat to tail on each node.
- Each container sleeps randomly between 0-10 seconds, after which it starts logging at a rate of 550 lines/second until it reaches 99k lines.
- This results in an overall expected ingestion rate for a single filebeat of 5.5k lines/s, and a total of 990,000 lines ingested per beat.
- The expected ingestion total for the load test is 2,970,000 lines.
We have observed that a reliably small number of log lines have failed to be ingested (somewhere in the neighborhood of
0.004% or 80-200 lines). After some parsing of the resulting data, we were able to determine that dropped lines were coming from the head of the initial file created by the load test. (TLDR, we figured this out by inserting a
total=X string in our log message and setting an expectation that there would be 30
total=1 lines, 30
total=2 lines all the way up to 30
total=99000 lines. Again, 30 containers emitting 99k lines).
It is also interesting that, after that initial drop, filebeat did not miss a single line for the remainder of the test.
I really can't believe that the events are being dropped from the head of the first file generated by the load test (I expected it would be at the tail of last file), but here we are.
Has anybody else experience this before and been able to track down a solution?
Our filebeat config file:
logging: level: info to_syslog: true http: enabled: true queue.mem: events: 90000 # flush.events * 15s flush.timeout: 5s flush.min_events: 6000 # bulk_max_size * workers processors: - add_cloud_metadata: overwrite: true filebeat.inputs: - type: docker enabled: true containers.ids: - '*' exclude_files: - 'filebeat' - 'POD' - 'kube-system' scan_frequency: 2s # how often to scan for new files close_inactive: 90s # close harvesters after file inactivity close_removed: true # https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-close-removed clean_removed: true # https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-clean-removed harvester_limit: 80 # File handler limit processors: - add_kubernetes_metadata: in_cluster: true - drop_event: when: not: equals: kubernetes.labels.app: "logload" # Decode structured logs - decode_json_fields: fields: ["message"] target: "" # move to root of document - rename: ignore_missing: true fail_on_error: false fields: - from: "level" to: "severity"