I am facing a problem with duplicate output when using queue.disk with Filebeat and restarting.
The contents of filebeat.yml and the operating procedure are as follows
filebeat.yml
filebeat.inputs:
- type: filestream
id: test-id
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /tmp/test.log
output.file:
path: "/tmp/"
filename: result
permissions: 0640
processors:
- drop_fields:
fields: ["agent.id", "agent.type", "agent.version", "agent.hostname", "agent.ephemeral_id", "log.offset"]
logging.level: debug
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat
keepfiles: 30
permissions: 0640
monitoring.enabled: false
monitoring.cluster_uuid: "XXXXXXXXXXXXXXXXXXXXX"
path.data: /data/filebeat/
http.enabled: true
queue.disk:
max_size: 1GB
filebeat.shutdown_timeout: 30s
1.Enter the first log in the target file.
/tmp/test.log
2022-07-29 09:00:00.000 0000001
/tmp/result
{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}
/data/filebeat/diskqueue
Total 8
-rw------- 1 root root 216 Jul 29 16:41 0.seg
-rw------- 1 root root 28 Jul 29 16:41 state.dat
2.Enter the second log in the target file
/tmp/test.log
2022-07-29 09:00:00.000 0000001
2022-07-29 09:01:00.000 0000002
/tmp/result
{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}
{"@timestamp": "2022-07-29T07:43:37.989Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "ecs":{"version": "1.12. 0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:01:00.000 0000002", "input":{"type": "filestream"}}
/data/file-beat/disk-queue
Total 8
-rw------- 1 root root 424 Jul 29 16:43 0.seg
-rw------- 1 root root 28 Jul 29 16:43 state.dat
3.Restart filebeat
# sudo systemctl restart filebeat.service
4.Enter the third log in the target file
/tmp/test.log
2022-07-29 09:00:00.000 0000001
2022-07-29 09:01:00.000 0000002
2022-07-29 09:02:00.000 0000003
/tmp/result
{"@timestamp": "2022-07-29T07:46:13.104Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "message": "2022-07-29 09:02:00. 000 0000003", "input":{"type": "filestream"}, "ecs":{"version": "1.12.0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}
/tmp/result.1
{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}
{"@timestamp": "2022-07-29T07:43:37.989Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "ecs":{"version": "1.12. 0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:01:00.000 0000002", "input":{"type": "filestream"}}
/data/file-beat/disk-queue
Total 12
-rw------- 1 root root 424 Jul 29 16:43 0.seg
-rw------- 1 root root 216 Jul 29 16:46 1.seg
-rw------- 1 root root 28 Jul 29 16:46 state.dat
Segments are divided
5.Restart filebeat
# sudo systemctl restart filebeat.service
/tmp/result
{"@timestamp": "2022-07-29T07:46:13.104Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "log":{"file":{"path":"/tmp/test. log"}}, "message": "2022-07-29 09:02:00.000 0000003", "input":{"type": "filestream"}, "ecs":{"version": "1.12.0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}}
[DUPLICATED]Same contents as /tmp/result.1
/tmp/result.1
{"@timestamp": "2022-07-29T07:46:13.104Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "message": "2022-07-29 09:02:00. 000 0000003", "input":{"type": "filestream"}, "ecs":{"version": "1.12.0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}
/tmp/result.2
{"@timestamp": "2022-07-29T07:41:53.980Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "input":{"type": "filestream"}, "agent":{"name": "XXXXX01"}, "ecs":{"version": "1. 12.0"}, "host":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:00:00.000 0000001"}
{"@timestamp": "2022-07-29T07:43:37.989Z","@metadata":{"beat": "filebeat", "type":"_doc", "version": "7.17.5"}, "ecs":{"version": "1.12. 0"}, "host":{"name": "XXXXX01"}, "agent":{"name": "XXXXX01"}, "log":{"file":{"path":"/tmp/test.log"}}, "message": "2022-07-29 09:01:00.000 0000002", "input":{"type": "filestream"}}
After this, "2022-07-29 09:02:00.000 0000003" is output every time filebeat is restarted.
Is this correct behavior?
Also, please point out if there are any mistakes in the settings, etc.