Filebeat disk queue not working as documented

According to this documentation Internal Queue:

Queue data is deleted from disk after it has been successfully sent to the output.

This is not happening. Sent data still exists in the files (*.seg).
And to make matters worse, new log events are being inserted into the queue files even if they are sent successfully.

The documentation is a bit misleading here. The disk queue is really a sequence of files, with the size of each file or segment controlled by the segment size parameter documented here: Configure the internal queue | Filebeat Reference [8.3] | Elastic

The events aren't individually deleted when they are sent, segment files are deleted when all the events in that segment have been sent.

What i have done and the results:

  • i stopped our elasticsearch server
  • executed the application to generate some logs
  • filebeat stored the events in ./data/diskqueue
-rw------- 1 user user  77K Jun 30 14:18 0.seg
-rw------- 1 user user   28 Jun 30 15:17 state.dat
  • started elastisearch server
  • i have verified that the logs have been sent to the server
  • executed the application again
  • events still being sent to the diskqueue folder. No deletion occurred.
-rw------- 1 user user  77K Jun 30 14:18 0.seg
-rw------- 1 user user  22K Jun 30 14:46 1.seg
-rw------- 1 user user   28 Jun 30 15:17 state.dat
  • manually restarted the server where filebeat agent was running (not related to the "problem")
  • generated some log events
-rw------- 1 user user 121K Jun 30 16:15 2.seg
-rw------- 1 user user   28 Jun 30 16:15 state.dat

Generating more logs just increases the 2.seg file size. (The elasticsearch server is online and receiving all events)

I finding hard to understand how this really works. Sorry.

Update: Today (01/07/2022) i have generated more log events. The 2.seg file just keeps growing.

My filebeat.yml:

  max_size: 10GB

setup.template.enabled: false
setup.ilm.enabled: false

- type: filestream
  id: input-log-json
  enabled: true
    - /app_path/app.log.json
    - ndjson:
        target: ""

  hosts: ["https://elasticsearch:9200"]
  api_key: "randomid:randomkey"
    enabled: true
    certificate_authorities: ["/path/elastic-certs/ca/ca.crt"]
    - index: "filebeat-logs-audit-%{[agent.version]}-%{+yyyy.MM.dd}"
        log.level: "INFO"
    - index: "filebeat-logs-error-%{[agent.version]}-%{+yyyy.MM.dd}"
        log.level: "ERROR"

Update from last friday (01/07/2022)...

At the end of my workday:

-rw------- 1 user user  23M Jul  1 13:15 0.seg
-rw------- 1 user user   28 Jul  1 13:15 state.dat

Today (04/07/2022):

-rw------- 1 user user 230M Jul  4 07:58 1.seg
-rw------- 1 user user   28 Jul  4 07:58 state.dat

A segment file is deleted when all of the events currently in the segment file have been acknowledged by the Elasticsearch output. This can happen at any point in time depending on the rate data enters the queue relative to the rate is marked as acknowledged (or consumed) in the queue.

A segment can be up to queue.segement_size bytes in size, which defaults to queue.max_size / 10. In this case a segment can grow to be up to 1G in size before the queue will force a new segment to be created.

As far as I can tell the segment files are eventually being deleted unless I'm misinterpreting your comments. The only conditions the queue is guaranteed to maintain are:

  1. The total disk space used by the queue segment files must be less than or equal to the configured max_size.
  2. The total size of each segment file must be less than the configured segment_size.

The rest of the queue behaviour is really an implementation detail.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.