[Filebeat] duplicate output when using queue.disk and restarting

I am facing a problem with duplicate output when using queue.disk with Filebeat and restarting.

The contents of filebeat.yml and the operating procedure are as follows

filebeat.yml

filebeat.inputs:
- type: filestream
  id: test-id
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /tmp/test.log

output.file:
  path: "/tmp/"
  filename: result
  permissions: 0640
processors:
  - drop_fields:
      fields: ["agent.id", "agent.type", "agent.version", "agent.hostname", "agent.ephemeral_id", "log.offset"]

logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 30
  permissions: 0640

monitoring.enabled: false
monitoring.cluster_uuid: "XXXXXXXXXXXXXXXXXXXXX"
path.data: /data/filebeat/
http.enabled: true
queue.disk:
  max_size: 1GB
filebeat.shutdown_timeout: 30s

1.Enter the first log in the target file.
/tmp/test.log
2022-07-29 09:00:00.000 0000001

/tmp/result
{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}

/data/filebeat/diskqueue

Total 8
-rw------- 1 root root 216 Jul 29 16:41 0.seg
-rw------- 1 root root 28 Jul 29 16:41 state.dat

2.Enter the second log in the target file
/tmp/test.log

2022-07-29 09:00:00.000 0000001
2022-07-29 09:01:00.000 0000002

/tmp/result

{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}
{"@timestamp": "2022-07-29T07:43:37.989Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "ecs":{"version": "1.12. 0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:01:00.000 0000002", "input":{"type": "filestream"}}

/data/file-beat/disk-queue

Total 8
-rw------- 1 root root 424 Jul 29 16:43 0.seg
-rw------- 1 root root 28 Jul 29 16:43 state.dat

3.Restart filebeat
# sudo systemctl restart filebeat.service

4.Enter the third log in the target file
/tmp/test.log

2022-07-29 09:00:00.000 0000001
2022-07-29 09:01:00.000 0000002
2022-07-29 09:02:00.000 0000003

/tmp/result
{"@timestamp": "2022-07-29T07:46:13.104Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "message": "2022-07-29 09:02:00. 000 0000003", "input":{"type": "filestream"}, "ecs":{"version": "1.12.0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}

/tmp/result.1

{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}
{"@timestamp": "2022-07-29T07:43:37.989Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "ecs":{"version": "1.12. 0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:01:00.000 0000002", "input":{"type": "filestream"}}

/data/file-beat/disk-queue

Total 12
-rw------- 1 root root 424 Jul 29 16:43 0.seg
-rw------- 1 root root 216 Jul 29 16:46 1.seg
-rw------- 1 root root 28 Jul 29 16:46 state.dat

Segments are divided

5.Restart filebeat
# sudo systemctl restart filebeat.service

/tmp/result
{"@timestamp": "2022-07-29T07:46:13.104Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "log":{"file":{"path":"/tmp/test. log"}}, "message": "2022-07-29 09:02:00.000 0000003", "input":{"type": "filestream"}, "ecs":{"version": "1.12.0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}}
[DUPLICATED]Same contents as /tmp/result.1

/tmp/result.1
{"@timestamp": "2022-07-29T07:46:13.104Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "message": "2022-07-29 09:02:00. 000 0000003", "input":{"type": "filestream"}, "ecs":{"version": "1.12.0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}

/tmp/result.2

{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}
{"@timestamp": "2022-07-29T07:43:37.989Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "ecs":{"version": "1.12. 0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:01:00.000 0000002", "input":{"type": "filestream"}}

After this, "2022-07-29 09:02:00.000 0000003" is output every time filebeat is restarted.

Is this correct behavior?
Also, please point out if there are any mistakes in the settings, etc.

Hmmm, I get the same result no matter how many times I try.
Is this correct behavior? Or is it a bug?

If these two lines are commented out and a memory queue is used, this phenomenon does not occur.

Hi @Yos, Thanks for noticing and reporting this. Indeed it seems to be a bug with the disk queue.

Thanks a lot for the detailed setp-by-step to reproduce the bug!

I'll open a issue on our GitHub today.

Hi @TiagoQueiroz , Thanks for your response and opening a issue(32560) on our GitHub.

1 Like