Filebeat filestream input rereading rotated log files

Hello!

I'm running into this very common problem of rotated files being reread and resent. I'm using Filebeat 8.1.0 with the new(ish) filestream input plugin. I've read the docs of the plugin and the article on this specific issue, still I did not manage to fix the problem. I must be misunderstanding one of the many options of the plugin, so here's the relevant part of my Filebeat config:

- type: filestream
  close.on_state_change.renamed: false
  close.on_state_change.removed: false
  clean_removed: false
  close.on_state_change.inactive: 20m
  ignore_older: 72h
  clean_inactive: 84h
  paths:
    - ${LOGS_PATH:?LOGS_PATH not set}/ldap/ldap.log*
  fields:
    log_type: ldap
    container_name: ldap

The log files are rotated by Logrotate, the corresponding config is the following:

/opt/project/logs/ldap/ldap.log {
    daily
    maxsize 34M
    missingok
    rotate 30
    nocompress
    nocreate
    notifempty

    sharedscripts
    postrotate
        systemctl kill -s HUP rsyslog.service
    endscript
}

This config rotates files by renaming and not by copying, so it should be fine. (Even checked inode numbers with stat, just to be sure.)

So, can anyone help me out here? :slight_smile: I can provide further info if needed.

(Filebeat is running in a custom Docker container, but I don't think it's relevant in this case.)

Did some further digging. On the previous version of this system/project the issue is not present. I was using Filebeat 6.4.2 with the following log input configuration:

- type: log
  close_renamed: false
  close_removed: false
  clean_removed: false
  close_inactive: 20m
  ignore_older: 72h
  clean_inactive: 84h
  close_timeout: 96h
  paths:
    - ${LOGS_PATH:?LOGS_PATH not set}/ldap/ldap.log*
  fields:
    log_type: ldap
    container_name: ldap

I must have failed somewhere during the migration to filestream, not sure where however. The only real change (well, apart from the input change) is the removal of close_timeout (currently close.reader.after_interval, I believe), but I did that on purpose, and I don't think it can be the cause of my issues.

Hi!

Took a more serious look into the issue as our "live" system is sadly quite efficient at reproducing it. In fact much more so than I'm, but after some struggling I think I managed to put together a POC that's consistent at triggering the issue.

filebeat.yml:

filebeat.inputs:
  - type: filestream
    paths:
      - /tmp/mock.log*

output.file:
  path: "/tmp/filebeat"
  filename: filebeat

logrotate.conf:

su root adm

/tmp/mock.log {
    daily
    missingok
    rotate 100
    notifempty
    nocreate
}

Mock service w/ logging and forced rotation:

#!/bin/bash

set -eu

rotations=20
events_per_file=200

for rotation in $(seq -f "%03g" $rotations); do
  for event in $(seq -f "%03g" $events_per_file); do
    echo "event_${rotation}_${event}" >> /tmp/mock.log
    sleep 0.01
  done

  logrotate -f logrotate.conf
  sleep 5
  echo "finished rotation $rotation/$rotations"
done

echo "Emitted $(( $rotations * $events_per_file )) events"

The script assumes that logrotate.conf is placed next to it. It's a bit brutish and slow, sorry about that. Changing the timings has the potential of fixing the issue, so I stopped optimizing it. (: Anyway, it emits 4000 logs (verify: wc -l /tmp/mock.log*), but the output of Filebeat will end up having more lines than that, check wc -l /tmp/filebeat/filebeat-*. (For me it's 4800.)

Checking the content of the output shows that some files were read more than once. As I did not use any special settings I believe it is an actual bug. Could somebody confirm this?

Also note that using the deprecated log input in this same scenario does produce the expected results, so I think it's an issue with the filestream input.

Giving this thread a small nudge.

I'm still waiting for a confirmation on this issue, so that I can make a bug report over at github. Unless I'm getting something terribly wrong this is a severe-ish regression in comparison to the log input.

Could you please share the debug logs of Filebeat with the test scenario you came up with? Also, please share how the inodes of the input files.

I modified the test to show the inodes each iteration as you requested, and realized that the issues always happens when the "main" log file (mock.log) is first rotated (to mock.log.1). Which in turn lead me to the sad discovery that the test is in fact failing to show the actual issue, and is only showing the symptoms of a wrong combination of configurations -- or maybe a different issue:

It seems that logrotate's nocreate combined with Filebeat's default of cleaning of removed files may cause rotated files to be reread. (Probably if the "main" log file is not created for "long enough" due to lack of events.) I always assumed that the docs are simply wrong in regards to clean_removed being name based ("When this option is enabled, Filebeat cleans files from the registry if they cannot be found on disk anymore under the last known name."), but it looks like it's actually not. Which is quite strange with the combination of closing being inode based. (And sort of makes it impossible to always close the files before cleaning?)

The sad part is that it's not even my actual issue, as I have clean_remove disabled. And I also have older generation of log files being reread. I guess I could try to make a debug log in the "prod" environment, but I'd need a filter for the -d flag to stop it from being humongous.

Anyway, here are the files you requested, though I'm not sure if they're of much interest any more:
https://drive.google.com/drive/folders/1HbT3qKrz4lqmp5Qye3tlTmpEm3Wi32sg?usp=sharing

Thank you!

We had a similar issue when files were resent when Filebeat was restarted and multiple inputs were configured. The trick was to set an ID. But this issue seems completely unrelated unfortunately.

Ookay, it took my sanity, but I think I actually managed to track down the actual issue this time: If a "shorter" file takes the place (name) of a "longer" one, it will be reread. So if app.log is shorter than app.log.1, than after (each) rotation it will be read again.

Smallest POC I can come up with:

filebeat.yml:

filebeat.inputs:
  - type: filestream
    paths:
      - /tmp/mock.log*

output.file:
  path: "/tmp/filebeat"
  filename: filebeat

logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat

Workflow:

# First "long" log file
echo -n "log_line\nlog_line" > mock.log
# Give fb some time to settle
sleep 10
# "Rotate"
mv mock.log mock.log.1 && touch mock.log
# Give fb some time to settle
sleep 10
# Second shorter log file
echo ANCHOR > mock.log
# Give fb some time to settle
sleep 10 
# Rotate again
mv mock.log.1 mock.log.2 && mv mock.log mock.log.1 && touch mock.log
# Give fb some time to settle
sleep 10
# Rotate again
mv mock.log.2 mock.log.3 && mv mock.log.1 mock.log.2 && mv mock.log mock.log.1 && touch mock.log
# Give fb some time to settle
sleep 10
# Check where "ANCHOR" was read from
cat /tmp/filebeat/* | grep ANCHOR | egrep -o "mock.log(\.[0-9])?"
mock.log
mock.log.1
mock.log.2

Debug log corresponding to this run:

I'm not sure if this is the full extent of the issue, in the debug log I can see lines like "File XY has been truncated", which are clearly wrong, but in the logs of the prod system I also saw some other lines that could be concerning. But it's very noise there and as such it's difficult to follow, so I could be wrong here.

Are you sure you are not seeing the same problem with the log input? Both filestream and log input rereads files from the beginning if the input files is truncated. It is the expected behaviour. So there should not be any difference between the two inputs.

Yes, I'm certain.

Also note that there are no actual truncations going on. The files are moved (renamed), not copytruncated. It could only be considered truncation if file identification was configured to be name based, but that's not the case.

OK, I think I have found the issue in the code. I will test my theory and submit a fix in the next few days. Thank you so much for taking the time to reproduce the issue and for giving such a detailed description of the problem.

1 Like

PR: Detect new files under known paths in filestream input by kvch · Pull Request #31268 · elastic/beats · GitHub

Well, that was fast. :slight_smile: You're the best, thank you.

Been using 8.2.0 for a few days now, and the fix seems to be working fine. Thanks again.

1 Like

Actually... :sweat_smile: There's some noise involved with this fix: For each rotation of each file the following is logged by Filebeat: Error while getting cursor meta data of entry <some_id>: requires pointer

A full example:

{"log.level":"error","@timestamp":"2022-05-11T22:10:17.400Z","log.logger":"input.filestream","log.origin":{"file.name":"filestream/prospector.go","file.line":266},"message":"Error while getting cursor meta data of entry native::6294505-2050: requires pointer","service.name":"filebeat","id":"rabbitmq-rabbitmq","prospector":"file_prospector","operation":"rename","source_name":"native::6294505-2050","os_id":"6294505-2050","new_path":"/logs/rabbitmq/rabbitmq.log.5","old_path":"/logs/rabbitmq/rabbitmq.log.4","ecs.version":"1.6.0"}

So far it seems to be a cosmetic issue only (?).