Filestream input sends duplicates events on restart and during operation

We use more than 1.800 filebeats with the filestream-input in version:

$ filebeat version
filebeat version 8.7.0 (amd64), libbeat 8.7.0 [a8dbc6c06381f4fe33a5dc23906d63c04c9e2444 built 2023-03-23 00:37:07 +0000 UTC]

Round about 20 of them resend the collected files during start and operation.

Every filestream-input has a uniquie id in it's configuration.

filebeat logs that the file has been truncated:
File was truncated. Reading file from offset 0. Path=/sample.log

We are sure, that the file has not been truncated by checking the attributes with the stat command. The filesize is never getting smaller and even device-id and inode hasn't changed.

The message comes from this line of code where fs.size() is compared with the offset:

We build a debug filebeat whith the changed code to get the values of fs.size() and offset like:

if fi.Size() < offset {
        log.Infof("File was truncated. Reading file from offset 0. Path=%s (fi.Size=%v, offset=%v)", path, fi.Size(), offset)

This logs:

2023-06-01T11:10:03.778+0200 info File was truncated. Reading file from offset 0. Path=/sample.log (fi.Size=14721226, offset=1054936285)

fi.Size() is the size of the file as determined with the stat command. offset comes from the registry.

Is fi.Size() < offset the right criteria to indicate if a file has been truncated?
Why does filebeat resend the logmessages over and over again?

Resending messages is a big problem for us.

1 Like

B.t.w. the deprecated log-input of filebeat works fine.

In the case that everything works fine, offset reflects the filesize fi.Size().

In the cases where filestream resends events, we see that the offset is much greater than the filesize.

Do anybody have a hint what causes the offset-entry to be wrong?

Hi @michaelbu,

  1. Could you share your full Filebeat configuration (redact any sensitive information like the output credentials)?
  2. Could you also provide some information about your system? OS, Filesystem used, etc?
  3. Have you run Filebeat with log debug enabled? Have you seen any log error or warning?
  4. Which application are you collecting logs from? How does it handle its logs and log rotation?
  5. Could you run Filebeat with the following logging configuration? I will not log any of the events collected and it also excludes some very verbose loggers that are not interesting for us at the moment.
logging:
  level: debug
  selectors:
    - file_watcher
    - harvester
    - input
    - input
    - input.filestream

Based on the description and Filebeat is seeing the file as truncated, what we need to find out is why.

Hi @TiagoQueiroz,

thanks for your answer. Let me try to answer your questions:

  1. filebeat is under control of graylog-sidecar. This is the config:
$ cat /var/lib/graylog-sidecar/generated/filebeat.conf
fields.collector_node_id: XXX
fields.gl2_source_collector: XXX
fields.source: XXX
fields_under_root: true

logging:
  level: info
  selectors: ["*"]
  to_files: true
  to_syslog: false
  files:
    name: filebeat
    rotateeverybytes: 10485760
    keepfiles: 2
  metrics:
    enabled: false

output.logstash:
  hosts: ["XXX", "XXX", "XXX", "XXX", "XXX", "XXX", "XXX", "XXX"]
  loadbalance: true
  ssl.verification_mode: full

path:
  data: /var/lib/graylog-sidecar/collectors/filebeat/data
  logs: /var/lib/graylog-sidecar/collectors/filebeat/log
  config : /etc/filebeat

filebeat.inputs:

- type: filestream
  id: XXX-agent
  enabled: true
  encoding: utf-8
  parsers:
  - multiline:
      type: pattern
      pattern: ^\d{4}-\d{2}-\d{2}
      negate: true
      match: after
  paths:
  - /opt/XXX/XXX/agent/logs/agent.log
  prospector.scanner.check_interval: 10s
  ignore_inactive: since_last_start
  fields_under_root: true
  fields:
    type: log
    application_name: XXX-agent

- type: filestream
  id: filebeat
  enabled: true
  encoding: plain
  parsers:
  - ndjson:
    keys_under_root: true
    overwrite_keys: true
    add_error_key: true
  processors:
  - drop_event:
      when:
        or:
          - equals:
              message: "Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning."
          - equals:
              message: "DEPRECATED: Treating the CommonName field on X.509 certificates as a host name when no Subject Alternative Names are present is going to be removed. Please update your certificates if needed. Will be removed in version: 8.0.0"
          - equals:
              message: "EXPERIMENTAL: The journald input is experimental"
          - equals:
              message: "File was truncated. Reading file from offset 0. Path=/var/log/graylog-sidecar/filebeat_stderr.log"
          - and:
            - equals:
                log.level: "info"
            - not:
                contains:
                  message: "File was truncated. Reading file from offset "
  - drop_fields:
      when:
        has_fields: ['ecs.version']
      fields: ["ecs.version"]
      ignore_missing: true
  paths:
  - /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat*.ndjson
  prospector.scanner.check_interval: 10s
  ignore_inactive: since_last_start
  fields_under_root: true
  fields:
    type: log
    application_name: filebeat

- type: filestream
  id: graylog-sidecar
  enabled: true
  encoding: utf-8
  paths:
  - /var/log/graylog-sidecar/*.log
  prospector.scanner.check_interval: 10s
  ignore_inactive: since_last_start
  fields_under_root: true
  fields:
    type: log
    application_name: linux-log-management

- enabled: true
  type: journald
  id: journald
  paths: [ "/var/log/journal" ]
  seek: cursor
  processors:
  - drop_fields:
      fields: ["event","host"]
      ignore_missing: true
  - drop_event:
      when:
        and:
          - or:
            - equals:
                syslog.identifier: "goferd"
          - range:
              syslog.priority:
                gte: 6

- enabled: true
  id: db2diag.log
  type: filestream
  encoding: plain
  fields:
    application_name: db2
    type: log
  fields_under_root: true
  parsers:
  - multiline:
      type: pattern
      match: after
      negate: true
      pattern: ^\d{4}-\d{2}-\d{2}-\d{2}\.\d{2}\.\d{2}\.\d{6}
  paths:
  - /db2/*/sqllib/db2dump/db2diag.log
  prospector.scanner.check_interval: 1s
  ignore_inactive: since_last_start

- enabled: true
  id: db2-auditing
  type: filestream
  encoding: plain
  fields:
    application_name: db2-auditing
    type: log
  fields_under_root: true
  parsers:
  - multiline:
      type: pattern
      match: after
      negate: true
      pattern: ^timestamp=\d{4}-\d{2}-\d{2}
  paths:
  - /var/db2/auditlog/*/db2audit_extract_db-*.log
  - /var/db2/auditlog/*/db2audit_extract_instance-*.log
  prospector.scanner.check_interval: 2s
  ignore_inactive: since_last_start

- enabled: true
  id: db2-backup
  type: filestream
  encoding: plain
  fields:
    application_name: db2-backup
    type: log
  fields_under_root: true
  paths:
  - /var/db2/backup/*/log/db2backup.*.log
  prospector.scanner.check_interval: 2s
  prospector.scanner.symlinks: false
  ignore_inactive: since_last_start

- enabled: true
  id: /var/log/dnf.log
  type: filestream
  encoding: plain
  exclude_lines:
  - .*DEBUG.*
  fields:
    application_name: dnf
    type: log
  fields_under_root: true
  parsers:
  - multiline:
      type: pattern
      match: after
      negate: true
      pattern: '\d{4}-\d{2}-\d{2}T\d{2}:'
  paths:
  - /var/log/dnf.log
  prospector.scanner.check_interval: 10s
  ignore_inactive: since_last_start
  • Operatinsystem is "Red Hat Enterprise Linux" in version 8.7
  • Filesysten is xfs
  • FIlebeat installed by official rpm-package as
    filebeat version 8.7.0 (amd64), libbeat 8.7.0 [a8dbc6c06381f4fe33a5dc23906d63c04c9e2444 built 2023-03-23 00:37:07 +0000 UTC]
  1. There are no errors and no relevant warnings. Only the deprecated message.

  2. Application is db2. As far as we know the logs are never rotated, they were only appended.
    To make sure that this assumption is true, we created a script which runs in a loop and check the filesize of all the logfiles every 0.6 seconds. The script runs longer than a day and whas not able to detect a file whose size is getting smaller. So there is no truncation.

  3. Because these are producive systems, we switched back from filestream- to log-input taht does not have this issue.

If you are interested in debug-logs, please give me some time to reproduce this issue in a test environment.

1 Like

Thanks for the quick answers!

I'll wait for the debug logs, if you could also include the output of the script that monitors the file size, that would be awesome.

Btw, having some timestamp as part of the script output is going to be super helpful to correlate its output and the logs.

Hi @TiagoQueiroz,

unfortunately I was not able to reproduce with debug enabled in a test environment up to now.

This morning I see this in our production where debug is not enabled:

Filebeat is running since 01 June:

-bash-4.2$ ls -alh log
-rw------- 1 graylog graylog  11M Jun  6 05:25 filebeat-20230601-1.ndjson
-rw------- 1 graylog graylog 5.0K Jun  1 09:56 filebeat-20230601.ndjson
-rw------- 1 graylog graylog 389K Jun  6 09:45 filebeat-20230606.ndjson

-bash-4.2$ ps -ef | grep filebeat
graylog   81071   2279  1 Jun01 ?        02:06:16 /usr/share/filebeat/bin/filebeat -c /var/lib/graylog-sidecar/generated/filebeat.conf --path.home /usr/share/filebeat

Filebeat collects it's own logfiles:

-bash-4.2$ cat filebeat-20230606.ndjson | jq -r '. | "\(."@timestamp") \(."log.level") \(.message)"'
2023-06-06T08:15:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:15:38.066+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:15:39.093+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:19:03.913+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:06.006+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:06.006+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:08.012+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:08.012+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:08.012+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:08.012+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:08.012+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:08.012+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:20:08.012+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:25:19.589+0200 info Reader was closed. Closing.
2023-06-06T08:25:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:25:35.055+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:25:38.044+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:25:38.051+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:25:41.075+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:25:48.953+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:25:51.075+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:25:58.045+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:26:05.068+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:26:09.041+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:28:06.758+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:04.505+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:30:06.969+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:06.969+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:06.969+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:06.969+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:06.969+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:06.969+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:08.972+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:08.972+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:30:08.972+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:35:13.005+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:13.628+0200 info Reader was closed. Closing.
2023-06-06T08:35:13.806+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:13.990+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:13.991+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:15.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:15.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:15.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:16.804+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:35:38.005+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:35:38.999+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:36:07.545+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:37:04.144+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:38:59.009+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:39:01.813+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:05.931+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:05.931+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:05.931+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:05.932+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:05.932+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:07.935+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:07.935+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:07.935+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:07.935+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:40:07.935+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:45:12.856+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:12.981+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:13.609+0200 info Reader was closed. Closing.
2023-06-06T08:45:13.983+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:15.973+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:15.973+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:17.610+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:18.976+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:19.988+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:45:38.004+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:45.974+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:45:58.544+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:48:04.758+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:48:59.016+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:49:14.994+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:50:06.947+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:06.947+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:50:08.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:55:13.622+0200 info Reader was closed. Closing.
2023-06-06T08:55:13.992+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:14.827+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:14.994+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:17.006+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:17.068+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:17.993+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:17.993+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:18.610+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:19.985+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:23.531+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:55:41.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:55:41.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:56:43.596+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:58:03.733+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T08:59:02.023+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T08:59:32.025+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:00:07.949+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:07.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:07.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:07.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:07.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:07.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:07.962+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:07.969+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:09.966+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:09.966+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:00:09.966+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:05:13.994+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:14.773+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:15.000+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:15.000+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:15.000+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:15.000+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:15.000+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:16.012+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:16.012+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:05:21.628+0200 info Reader was closed. Closing.
2023-06-06T09:05:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:05:30.522+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:05:40.020+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:07:24.689+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:07:53.659+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:08:04.753+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.972+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.972+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.972+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.973+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.973+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.973+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.973+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:10:06.974+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:15:13.017+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:13.029+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:14.732+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:14.793+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:15.613+0200 info Reader was closed. Closing.
2023-06-06T09:15:16.022+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:16.022+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:19.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:15:35.056+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:15:38.013+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:16:07.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:18:05.751+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:18:27.787+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:18:59.099+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:19:05.594+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:20:07.959+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.959+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.959+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.960+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.960+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.960+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.960+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.960+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.960+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:20:07.961+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:25:13.009+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:25:13.988+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:25:13.988+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:25:15.613+0200 info Reader was closed. Closing.
2023-06-06T09:25:15.782+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:25:16.995+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:25:17.833+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:25:23.531+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:25:37.998+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:25:39.009+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:26:00.562+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:26:08.010+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:26:09.009+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:28:04.759+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:28:58.040+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:29:09.041+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:30:08.954+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:35:14.990+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:14.993+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:14.993+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:15.792+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:16.983+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:16.987+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:16.987+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:16.987+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:20.615+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:39.470+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:35:48.980+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:35:53.496+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:36:05.519+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:38:03.708+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:38:50.007+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:39:01.818+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:39:03.822+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:40:07.925+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:40:07.925+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:40:07.926+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:40:07.926+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:40:07.926+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:45:12.953+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:12.953+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:13.668+0200 info Reader was closed. Closing.
2023-06-06T09:45:13.748+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:13.855+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:15.948+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:18.554+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:19.515+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:21.872+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:23.532+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:45:37.959+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:45:40.084+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:48:04.694+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:48:47.973+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:48:50.605+0200 info File is inactive. Closing because close_inactive of 5m0s reached.
2023-06-06T09:50:06.883+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.883+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.883+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.884+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.884+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.884+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.884+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.884+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:06.884+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:08.887+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]
2023-06-06T09:50:08.888+0200 info Harvester started for paths: [/db2/*/sqllib/db2dump/db2diag.log]

Here are the "truncated"-messages filtered from the log:

-bash-4.2$ cat filebeat-20230606.ndjson | jq -r '. | "\(."@timestamp") \(."log.level") \(.message)"' | grep trunc
2023-06-06T05:25:53.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T05:45:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T06:15:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T06:25:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T06:35:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T06:45:23.531+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T06:55:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T07:05:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T07:15:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T07:25:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T07:55:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:05:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:15:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:25:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:35:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:45:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T08:55:23.531+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:05:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:15:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:25:23.531+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-06T09:45:23.532+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson

Remarkable is the pattern of mostley 10 minutes between each entry.

Here are the attributes of the logfile:

-bash-4.2$ stat /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
  File: ‚Äė/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson‚Äô
  Size: 406492          Blocks: 896        IO Block: 4096   regular file
Device: fd28h/64808d    Inode: 71          Links: 1
Access: (0600/-rw-------)  Uid: (  389/ graylog)   Gid: (  385/ graylog)
Access: 2023-06-06 09:50:09.595512897 +0200
Modify: 2023-06-06 09:50:08.886518646 +0200
Change: 2023-06-06 09:50:08.886518646 +0200
 Birth: -

And this is the registry:

-bash-4.2$ ls -alh ../data/registry/filebeat/
total 8.3M
drwxr-x--- 2 graylog graylog  158 Jun  6 09:50 .
drwxr-x--- 3 graylog graylog   22 Apr 26 17:26 ..
-rw------- 1 graylog graylog 172K Jun  6 09:50 37999763.json
-rw------- 1 graylog graylog 163K Apr 26 17:26 572.json-1682522782670094302.bak
-rw------- 1 graylog graylog   81 Jun  6 09:50 active.dat
-rw------- 1 graylog graylog 7.3M Jun  6 09:56 log.json
-rw------- 1 graylog graylog    0 Apr 26 17:26 log.json-1682522782670294442.bak
-rw------- 1 graylog graylog   16 Apr 26 17:26 meta.json

The content of the registry:

-bash-4.2$ grep -r /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson .
./log.json:{"k":"filestream::filebeat::native::71-64808","v":{"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"},"ttl":1800000000000,"updated":[515926818243,1686038123],"cursor":{"offset":5585756}}}
./log.json:{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[515947789710,1686038123],"cursor":{"offset":5995709},"meta":{"identifier_name":"native","source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson"}}}
./log.json:{"k":"filestream::filebeat::native::71-64808","v":{"cursor":{"offset":5996334},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"},"ttl":1800000000000,"updated":[515950719029,1686038137]}}
./log.json:{"k":"filestream::filebeat::native::71-64808","v":{"cursor":{"offset":5996962},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"},"ttl":1800000000000,"updated":[515951807862,1686038139]}}
./log.json:{"k":"filestream::filebeat::native::71-64808","v":{"cursor":{"offset":5997589},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"},"ttl":1800000000000,"updated":[515953167899,1686038145]}}
./log.json:{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[515955793284,1686038169],"cursor":{"offset":5998214},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"}}}
./log.json:{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[515966328886,1686038293],"cursor":{"offset":5998839},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"}}}
./37999763.json:{"_key":"filestream::filebeat::native::71-64808","ttl":1800000000000,"updated":[515993614898,1686037809],"cursor":{"offset":5583874},"meta":{"identifier_name":"native","source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson"}},

It seems suspiciuous for me that value of the cursor is much greater than the size of the file.

My observations:

  • The issue occures more often but sporadic on some of the 1.800 nodes.
  • The affected logfiles on this nodes changes.
  • If the filebeat logs a collected logfile as truncated, he will do this cyclic in a 10 minute intervall.

I'll keep trying to reproduce in a test environment whith debugging enabled.

Hi @TiagoQueiroz,

in the mean time I'm facing this issue:

The filestream-input resends every log after a reboot of the system. He does not when restarting the service with systemctl restart filebeat.

I'm able to reproduce in this test setup:

  • Vanilla installation of a filebeat 8.7.0 on RHEL 7.9 Linux
  • filebeat as a systemctl service
  • filebeats configuration:
$  /etc/filebeat/filebeat.yml
logging:
  level: info
  selectors: ["*"]
  metrics.enabled: false

output.logstash:
  hosts: ["XXX"]

filebeat.inputs:
- enabled: true
  id: test-log
  type: filestream
  encoding: plain
  paths:
  - /var/log/filebeat/test.log
  • With a static logfile:
$ cat /var/log/filebeat/test.log
Line 1
Line 2

I noticed that the DeviceID of the file changes during the reboot of the node.

  • Before reboot:
 $ stat /var/log/filebeat/test.log
  File: ‚Äė/var/log/filebeat/test.log‚Äô
  Size: 14              Blocks: 8          IO Block: 4096   regular file
Device: fd04h/64772d    Inode: 1749        Links: 1
  • After reboot:
 $ stat /var/log/filebeat/test.log
  File: ‚Äė/var/log/filebeat/test.log‚Äô
  Size: 14              Blocks: 8          IO Block: 4096   regular file
Device: fd09h/64777d    Inode: 1749        Links: 1

Do you need some debug-logs or can you can you confirm this behaviour?

Might this be a problem of how the ID is stored in the registry like described in this discussion Filestream input duplicating events after every restart

Kind regards
Michael

B.t.w. this is the filesystem of the logfile. It's created by lvm:

mount | grep /var
/dev/mapper/XXXVG00-varlv on /var type xfs (rw,relatime,attr2,inode64,noquota)`

Thanks @michaelbu !

  1. If deviceIDs and/or inodes are changing, Filebeat will re-ingest the files. To work around it, you can change the filie_identitly option: filestream input | Filebeat Reference [8.11] | Elastic. You can also take a look at the reading from network shares section, it's meant to workaround the same problem of changing inode/deviceIDs.
  2. Filebeat should not be harvesting its own logs. There is at least on debug logger that will log every event collected by Filebeat, which will make the logs explode. If you need to collect logs from Filebeat, the best way is to deploy a second Filebeat to collect the logs from the first one. You will end up with at least one Filebeat you can't collect logs from, the "monitoring" one.
  3. I'm starting to believe the problems are related to the filesystem, how it's generating events and how Filebeat is reacting to them. Do you know if there is anything on you file system that happens at 10min intervals? Like file system snapshots?

unfortunately I was not able to reproduce with debug enabled in a test environment up to now.

Have you been able to reproduce it without log debugs on your staging environment?

Given my current file system suspicion, does your staging environment uses the same file system/configuration as your production one?

I'm also pretty sure we don't truncate our own log files, I'll double check that just in case.

Thank you very much for your thoughts and hints @TiagoQueiroz!

Just to clarify we now have two different issues:

  • First: Resending events in cause of filebeat logs truncated logfiles.
  • Second: Resending events in cause of a system reboot and a change of the deviceID.

For the first issue you suggest to pin the deviceID with the file_identitly configuration.
I wonder if there is a better criteria like the blkid as discussed here? But unfortunately I'm not a deep Linux expert.

Can device numbers change after reboot?

The adminstrative overhead for more than 1.800 filebeats each collecting logs from up to 10 filesystems is significant.

To answer your questions:

2.) We do this about more than 3 years and we make sure to collect only the messages with a severity higher than info. Is this you worry about?

3.) This question is releated to the first issue:

  • I'm not sure if there are snapshots but I try to check this.
  • Staging and production are "identical" in case of the filesystem but different in load.
  • I was just busy with finding the cause for the second issue, so I had no time to reproduce in the staging environment without debug. The good news is that I'm allowed to test in the production, so please give me some time.

Hi @TiagoQueiroz,

I checked if the filesystem is snapshotted: VMware snapshots the VM once per day. So this does not fit our 10 minute pattern.

Hi @michaelbu, thanks for the quick reply!

There is too much variation between deviceID and how file systems handle inode, so there isn't a one fits all solution. Currently we're working on a new file identity that will solve that. It will likely work by fingerprinting the file instead of relying on any files system feature. You can follow the progress here: [Filebeat] Implement a more stable file identifier · Issue #34419 · elastic/beats · GitHub

Regarding #2, it's mostly it. Even on Elastic-Agent we use a Filebeat to collect the Elastic-Agent and other Beats logs, but it's a separated one, so all "production" workload performed by the Elastic-Agent and Beats is isolated from the one collecting and shipping monitoring data.

Take your time on testing/reproducing this.

Debug logs will help a lot and being able to reproduce it would be the perfect scenario.

A couple of questions:

  1. Is there any tool trying to rotate/manage those logs?
  2. Are there any other files having data duplicated due to truncation?

Hi @TiagoQueiroz,

thanks for your patience!

this will be very helpfull. We are heavily affected by this problem and highly interested in this solution :wink:

My linux expert collegues confirmed, that changing the deviceID during reboots is a normal behaviour in the latest Linux distributions.
They also told me that every Linux disk has a UUID which is fix. But you might be right that there are distributions which handles this in a different way ...

-bash-4.2$ blkid | grep varlog
/dev/mapper/XXXVG00-varloggraylogsidecarlv: UUID="1e22734d-cc1c-443f-ab15-de977becc05b" TYPE="xfs"

Btw if we switch to file_identity will cause in re-reading all logfiles.

To answer your questions:

  1. There is no tool for log rotation.
  2. On the affected systems the logfiles are form db2 and mysql and also the filebeat log itself.
1 Like

Hi @TiagoQueiroz,

I would like to share this observation with you. I only managed to see this with the productive filebeat without debugging enabled. The filestream input collects the logs of the own filebeat itself. All other logs will be processed by the deprecated log input, because of the truncated problem.

filebeat logs it's own active logfile as truncated:

$ cat log/filebeat-20230606.ndjson  | jq -r '. | "\(."@timestamp") \(."log.level") \(.message)"' | grep trunc
...
2023-06-08T13:35:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson
2023-06-08T13:45:23.530+0200 info File was truncated. Reading file from offset 0. Path=/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson

If this happens this problem seems to get stuck. And be repeatedly reported by the filebeat.
For me it looks suspiciuos that the cursor value will never be set to the right value of the filesize?

-bash-4.2$ date; stat log/filebeat-20230606.ndjson
Thu Jun  8 13:37:25 CEST 2023
  File: ‚Äėlog/filebeat-20230606.ndjson‚Äô
  Size: 5586417         Blocks: 11016      IO Block: 4096   regular file
Device: fd28h/64808d    Inode: 71          Links: 1
Access: (0600/-rw-------)  Uid: (  389/ graylog)   Gid: (  385/ graylog)
Access: 2023-06-08 13:37:11.801582270 +0200
Modify: 2023-06-08 13:37:05.817630738 +0200
Change: 2023-06-08 13:37:05.817630738 +0200
 Birth: -
-bash-4.2$ date; stat log/filebeat-20230606.ndjson
Thu Jun  8 13:46:25 CEST 2023
  File: ‚Äėlog/filebeat-20230606.ndjson‚Äô
  Size: 5604283         Blocks: 11144      IO Block: 4096   regular file
Device: fd28h/64808d    Inode: 71          Links: 1
Access: (0600/-rw-------)  Uid: (  389/ graylog)   Gid: (  385/ graylog)
Access: 2023-06-08 13:45:39.805467639 +0200
Modify: 2023-06-08 13:45:39.180472701 +0200
Change: 2023-06-08 13:45:39.180472701 +0200
 Birth: -

In the registry you see that the cursor seems to be growing and will never be resetted?

-bash-4.2$ tail -F data/registry/filebeat/log.json | grep filebeat-20230606.ndjson
{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[516228400298,1686224409],"cursor":{"offset":608824081},"meta":{"identifier_name":"native","source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson"}}}
{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[515927009430,1686224723],"cursor":{"offset":608825963},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"}}}
{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[516200774197,1686224723],"cursor":{"offset":614425854},"meta":{"identifier_name":"native","source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson"}}}
{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[516201996612,1686224737],"cursor":{"offset":614426484},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"}}}
{"k":"filestream::filebeat::native::71-64808","v":{"ttl":1800000000000,"updated":[516202574976,1686224739],"cursor":{"offset":614428364},"meta":{"source":"/var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson","identifier_name":"native"}}}

Do you have an idea about why this happens?
I would expect that after a truncation filebeat will reread the file and update the cursor to the actual filesize?

Btw my script monitors the change of the filesize every 0.6 seconds and do not detect a truncation:

Thu Jun  8 13:40:08 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5589557 auf 5596428
Thu Jun  8 13:45:13 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5596428 auf 5597041
Thu Jun  8 13:45:14 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5597041 auf 5599356
Thu Jun  8 13:45:15 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5599356 auf 5600615
Thu Jun  8 13:45:17 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5600615 auf 5601240
Thu Jun  8 13:45:23 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5601240 auf 5601773
Thu Jun  8 13:45:35 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5601773 auf 5602403
Thu Jun  8 13:45:39 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5602403 auf 5604283
Thu Jun  8 13:48:03 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5604283 auf 5604907
Thu Jun  8 13:49:03 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5604907 auf 5605534
Thu Jun  8 13:49:08 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5605534 auf 5606160
Thu Jun  8 13:49:09 CEST 2023: Datei /var/lib/graylog-sidecar/collectors/filebeat/log/filebeat-20230606.ndjson Grösse geändert von 5606160 auf 5606788

@michaelbu thanks a lot for all the information!

The good news is that I managed to reproduce a similar behaviour:

  • Filestream input is reading any file
  • The file gets truncated (`truncate --size 0 /path/to/file)
  • Filestream detects the truncation and logs accordingly
  • The file size is not reset in the cursor (registry)
  • However, stat shows the correct size (zero, then it starts increasing again)

The key differences I see from my environment to yours:

  • I'm running Arch Linux
  • I'm using ext4 instead of xfs

Without restarting Filebeat I did not perceive any data duplication.

This is a bug in Filestream, the cursor in the registry should be updated when the file is truncated. I'll further investigate next week.

One interesting thing I noticed on your Filebeat configuration is that the input who reads Filebeat logs drops the events mentioning file truncation, here is the relevant snippet

  processors:
  - drop_event:
      when:
        or:
          - equals:
              message: "File was truncated. Reading file from offset 0. Path=/var/log/graylog-sidecar/filebeat_stderr.log"
          - and:
            - equals:
                log.level: "info"
            - not:
                contains:
                  message: "File was truncated. Reading file from offset "

Did you write this Filebeat configuration or is it "auto-generated" by GrayLog?

My current theory is that GrayLog is somehow truncating the file.

However I'm very puzzled that neither stat nor your script is showing this.

Could you share your monitoring scrip? I would like to run some tests using the same script.

1 Like

Hi @TiagoQueiroz,

that are very good news! Thank you very much for your investigation!

I also have goog news: I was also able to reproduce with debug enabled in our production and would like share this data with you:

  • debug logs
  • registry content
  • filebeat configuration
  • the script for checking the filesize

Am I missing something?
Can I share it with you in a private way, because it contains sensitve informations?

Btw. I also figured out that filebeat collecting it's own logs will lead to truncations because of filebeats log rotation. This happens onece per rotation and seems ok for me.
Do you think collecting the own logs by a journald-input instead of filestream is a good workaround for this? If so, is there a way to get this output in json?

Great to hear that you find an issue with the cursor in some circumstances!
In my understanding something make the registry break and if this happens this will not be repaired and the logfile will be read over and over again until it's closed?

Let me answer your questions:

Did you write this Filebeat configuration or is it "auto-generated" by GrayLog?

The drop_event is configured by us and should drop all messages which are not significant for us.
This are the conditions:

  • A logfile like /var/log/graylog-sidecar/filebeat_stderr.log is written complete new . This is truncation "by design" and ok.
  • We do not want to collect all info messages except the truncated ones.

My current theory is that GrayLog is somehow truncating the file.

Graylog (Sidecar) distributes the self written filebeat configuration to the nodes and restarts the filebeat process.
I do not think that Graylog truncates something.
I checked if graylog-sidecar restarts filebeat or start it multiple times but I do not see something suspicious: filebeat is running a couple of days and there is a mechanismen in sidecar that ensures that filebeat is only running once.

I'm using ext4 instead of xfs
Maybe the method fs.Size() in the code determines the wrong value? Do you have a code-snippet that I can run in production run to check?

Thank you very much!

Regards
Michael