Filebeat reliably dropping lines from head of first new files it sees during load testing

bschaeffer · June 12, 2019, 8:28pm

We are running a logging load test on our filebeat configuration and have observed that it is reliably dropping a small percentage of lines from the head of the first files it is seeing.

Our Setup

Filebeat is deployed to a 3 node GKE cluster as a DaemonSet. GKE is rotating our files every 10MB.

Our log load test is setup as follows (config at the end of the post):

Deployed as a single Kubernetes job run on each node for a total of 3 jobs.
Each job has 10 containers, which creates 10 log files for filebeat to tail on each node.
Each container sleeps randomly between 0-10 seconds, after which it starts logging at a rate of 550 lines/second until it reaches 99k lines.
This results in an overall expected ingestion rate for a single filebeat of 5.5k lines/s, and a total of 990,000 lines ingested per beat.
The expected ingestion total for the load test is 2,970,000 lines.

We have observed that a reliably small number of log lines have failed to be ingested (somewhere in the neighborhood of 0.004% or 80-200 lines). After some parsing of the resulting data, we were able to determine that dropped lines were coming from the head of the initial file created by the load test. (TLDR, we figured this out by inserting a total=X string in our log message and setting an expectation that there would be 30 total=1 lines, 30 total=2 lines all the way up to 30 total=99000 lines. Again, 30 containers emitting 99k lines).

It is also interesting that, after that initial drop, filebeat did not miss a single line for the remainder of the test.

I really can't believe that the events are being dropped from the head of the first file generated by the load test (I expected it would be at the tail of last file), but here we are.

Has anybody else experience this before and been able to track down a solution?

Our filebeat config file:

logging:
  level: info
  to_syslog: true

http:
  enabled: true

queue.mem:
  events: 90000 # flush.events * 15s
  flush.timeout: 5s
  flush.min_events: 6000 # bulk_max_size * workers

processors:
  - add_cloud_metadata:
      overwrite: true

filebeat.inputs:
  - type: docker
    enabled: true
    containers.ids:
      - '*'
    exclude_files:
      - 'filebeat'
      - 'POD'
      - 'kube-system'

    scan_frequency: 2s   # how often to scan for new files
    close_inactive: 90s  # close harvesters after file inactivity
    close_removed: true  # https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-close-removed
    clean_removed: true  # https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-clean-removed
    harvester_limit: 80  # File handler limit

    processors:
      - add_kubernetes_metadata:
          in_cluster: true

      - drop_event:
          when:
            not:
              equals:
                kubernetes.labels.app: "logload"

      # Decode structured logs
      - decode_json_fields:
          fields: ["message"]
          target: "" # move to root of document

      - rename:
          ignore_missing: true
          fail_on_error: false
          fields:
            - from: "level"
              to: "severity"

bschaeffer · June 13, 2019, 4:12pm

Solved: the add_kubernetes_metadata processors silently not tagging lines from files it doesn't have metadata about yet, so our load test was dropping them on the floor.

I thought it would at least write an error or emit a a debug line but I haven't seen anything.

bschaeffer · June 18, 2019, 1:14pm

FWIW: https://github.com/elastic/beats/issues/6042

system · July 16, 2019, 1:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat failing way way behind when shipping output Beats filebeat	1	293	July 3, 2019
Filebeat appears to drop files randomly Beats filebeat	8	1727	October 9, 2017
Filebeat is skipping lines randomly Beats filebeat	2	1258	October 20, 2017
Filebeat missing log lines Beats filebeat	16	8375	July 5, 2017
Filebeat cannot collect the first few log lines Beats filebeat	1	34	November 4, 2024

Filebeat reliably dropping lines from head of first new files it sees during load testing

Related topics