I'm running into this very common problem of rotated files being reread and resent. I'm using Filebeat 8.1.0 with the new(ish) filestream input plugin. I've read the docs of the plugin and the article on this specific issue, still I did not manage to fix the problem. I must be misunderstanding one of the many options of the plugin, so here's the relevant part of my Filebeat config:
Did some further digging. On the previous version of this system/project the issue is not present. I was using Filebeat 6.4.2 with the following log input configuration:
I must have failed somewhere during the migration to filestream, not sure where however. The only real change (well, apart from the input change) is the removal of close_timeout (currently close.reader.after_interval, I believe), but I did that on purpose, and I don't think it can be the cause of my issues.
Took a more serious look into the issue as our "live" system is sadly quite efficient at reproducing it. In fact much more so than I'm, but after some struggling I think I managed to put together a POC that's consistent at triggering the issue.
#!/bin/bash
set -eu
rotations=20
events_per_file=200
for rotation in $(seq -f "%03g" $rotations); do
for event in $(seq -f "%03g" $events_per_file); do
echo "event_${rotation}_${event}" >> /tmp/mock.log
sleep 0.01
done
logrotate -f logrotate.conf
sleep 5
echo "finished rotation $rotation/$rotations"
done
echo "Emitted $(( $rotations * $events_per_file )) events"
The script assumes that logrotate.conf is placed next to it. It's a bit brutish and slow, sorry about that. Changing the timings has the potential of fixing the issue, so I stopped optimizing it. (: Anyway, it emits 4000 logs (verify: wc -l /tmp/mock.log*), but the output of Filebeat will end up having more lines than that, check wc -l /tmp/filebeat/filebeat-*. (For me it's 4800.)
Checking the content of the output shows that some files were read more than once. As I did not use any special settings I believe it is an actual bug. Could somebody confirm this?
Also note that using the deprecated log input in this same scenario does produce the expected results, so I think it's an issue with the filestream input.
I'm still waiting for a confirmation on this issue, so that I can make a bug report over at github. Unless I'm getting something terribly wrong this is a severe-ish regression in comparison to the log input.
I modified the test to show the inodes each iteration as you requested, and realized that the issues always happens when the "main" log file (mock.log) is first rotated (to mock.log.1). Which in turn lead me to the sad discovery that the test is in fact failing to show the actual issue, and is only showing the symptoms of a wrong combination of configurations -- or maybe a different issue:
It seems that logrotate's nocreate combined with Filebeat's default of cleaning of removed files may cause rotated files to be reread. (Probably if the "main" log file is not created for "long enough" due to lack of events.) I always assumed that the docs are simply wrong in regards to clean_removed being name based ("When this option is enabled, Filebeat cleans files from the registry if they cannot be found on disk anymore under the last known name."), but it looks like it's actually not. Which is quite strange with the combination of closing being inode based. (And sort of makes it impossible to always close the files before cleaning?)
The sad part is that it's not even my actual issue, as I have clean_remove disabled. And I also have older generation of log files being reread. I guess I could try to make a debug log in the "prod" environment, but I'd need a filter for the -d flag to stop it from being humongous.
We had a similar issue when files were resent when Filebeat was restarted and multiple inputs were configured. The trick was to set an ID. But this issue seems completely unrelated unfortunately.
Ookay, it took my sanity, but I think I actually managed to track down the actual issue this time: If a "shorter" file takes the place (name) of a "longer" one, it will be reread. So if app.log is shorter than app.log.1, than after (each) rotation it will be read again.
# First "long" log file
echo -n "log_line\nlog_line" > mock.log
# Give fb some time to settle
sleep 10
# "Rotate"
mv mock.log mock.log.1 && touch mock.log
# Give fb some time to settle
sleep 10
# Second shorter log file
echo ANCHOR > mock.log
# Give fb some time to settle
sleep 10
# Rotate again
mv mock.log.1 mock.log.2 && mv mock.log mock.log.1 && touch mock.log
# Give fb some time to settle
sleep 10
# Rotate again
mv mock.log.2 mock.log.3 && mv mock.log.1 mock.log.2 && mv mock.log mock.log.1 && touch mock.log
# Give fb some time to settle
sleep 10
# Check where "ANCHOR" was read from
cat /tmp/filebeat/* | grep ANCHOR | egrep -o "mock.log(\.[0-9])?"
mock.log
mock.log.1
mock.log.2
Debug log corresponding to this run:
I'm not sure if this is the full extent of the issue, in the debug log I can see lines like "File XY has been truncated", which are clearly wrong, but in the logs of the prod system I also saw some other lines that could be concerning. But it's very noise there and as such it's difficult to follow, so I could be wrong here.
Are you sure you are not seeing the same problem with the log input? Both filestream and log input rereads files from the beginning if the input files is truncated. It is the expected behaviour. So there should not be any difference between the two inputs.
Also note that there are no actual truncations going on. The files are moved (renamed), not copytruncated. It could only be considered truncation if file identification was configured to be name based, but that's not the case.
OK, I think I have found the issue in the code. I will test my theory and submit a fix in the next few days. Thank you so much for taking the time to reproduce the issue and for giving such a detailed description of the problem.
Actually... There's some noise involved with this fix: For each rotation of each file the following is logged by Filebeat: Error while getting cursor meta data of entry <some_id>: requires pointer
A full example:
{"log.level":"error","@timestamp":"2022-05-11T22:10:17.400Z","log.logger":"input.filestream","log.origin":{"file.name":"filestream/prospector.go","file.line":266},"message":"Error while getting cursor meta data of entry native::6294505-2050: requires pointer","service.name":"filebeat","id":"rabbitmq-rabbitmq","prospector":"file_prospector","operation":"rename","source_name":"native::6294505-2050","os_id":"6294505-2050","new_path":"/logs/rabbitmq/rabbitmq.log.5","old_path":"/logs/rabbitmq/rabbitmq.log.4","ecs.version":"1.6.0"}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.