Hello,
I am trying to understand how filebeat reacts to files that are dumped at once every x hours (versus log files that are fed on the fly, line by line)
For instance, I have a file /log/audit.dump that is recreated from scracth by a script every day at midnight with new content. This is a simple csv file with a few thousands lines.
I have witnessed several behaviors from filebeat :
The new generated file is completely read and the new lines are sent to logstash (expected behavior)
The file starts at the current offset and the first line sent to logstash is truncated
The file is not read at all as if filebeat had not detected the file change.
Could you please shed some light on how one is supposed to approach this kind of situation ?
Am I missing something in the way I am configuring filebeat ?
I think the reason behind the behaviour is as following:
I assume that not a new file is created but the existing file (with the same inode) is reused and the same content is put into it. This has the following consequences:
Because the inode stays the same, filebeat assumes it is still the same file.
If the new content is shorter then the content before, it assumes that something strange happened and starts reading from the beginning (your expected behaviour)
If the file is longer the the previous one, it reads the lines from the old offset, which could start in the middle of a line
If the file has the same length, nothing happens.
So the behaviour is somehow expected. To solve this issue, I recommend you to rename the old file, create the new file and dump the content, remove the old file. The reason I do not recommend delete and create directly, as this could lead to inode reuse: https://github.com/elastic/beats/issues/1341
@ruffin Thanks for your quick answer, I will ask the dev to try and modify the way they create the file.
Indeed, I confirm the inode remains the same and that it matches the one fromt he registry file.
I will keep you posted once the modification has been performed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.