I am using Filebeat to read a set of files logged using the Java Logback library. Files are named with the format "prefix-%d{yyyy-MM-dd}.%i.json", where %i is an index that is incremented when Logback rolls over. Logback is configured to remove files, based on total size and time. With this naming strategy, and Filebeat searching for any file matching "prefix-*" there is no need to rename a file when Logback rolls over, so once a file is created it will always have the same name, until it's removed. That seemed to be all well and good.
However, we are observing inode reuse on the filesystem (we're using ext4) and I am not sure how to configure the clean_* and close_* options to handle this scenario. It seems like with the way we have Filebeat configured, the deletion of file X, and reuse of the inode number by file Y is interpreted as rename from X to Y. In our scenario, renames and moves never happen, only creates and deletes.
I believe this is also compounded when we had issues in our pipeline that would cause Logstash to stop processing. It looks like when this happens, Filebeat is unable to ship any logs, and harvesters are not created or progressed, but meanwhile the application is still logging to new files and deleting old ones, potentially reusing inodes. I've been able to reproduce on the first attempt by stopping Filebeat then triggering enough logging in the application to cause a lot of rollover, as a proxy for what happens when Logstash is not available. Given the reproducibility, it doesn't seem like a "one in a million" kind of thing. Granted, we want Logstash to remain up, but if we do get downtime, we'd like to avoid data loss caused by inode reuse.
We're using the following settings (I think these are all that's relevant, let me know if more is needed):
close_inactive: 5m
close_renamed: false
close_removed: true
close_eof: false
clean_inactive: 0
clean_removed: true
close_timeout: 0
With Filebeat 5.2.2
Which are all just the defaults we've taken from the Filebeat puppet module we're using, and I think the defaults of Filebeat itself.
In reading the config, I expected clean_removed to be the answer, but it was already enabled when this problem occurred. Then I thought it looked like clean_inactive is the right setting, but I'm a bit worried by what inactivity means. I'm happy to remove files once Filebeat has shipped them to Logstash, but I don't want to clean files which are inactive, but are not yet processed by Filebeat. In reading the doc, I believe I have to set ignore_older, which goes by modification timestamp, in conjunction with clean_inactive. I'm worried that a direct comparison of current time vs. modification timestamp is disconnected from what Filebeat has actually processed. To prevent inode reuse, I'd have to set ignore_older to be short enough that Filebeat would begin ignoring files not because they're old, but because Filebeat can't ship them downstream, or was shutdown for whatever reason. Once Logback deletes files, that data is gone forever, and that's the trade off we're choosing, ideally Filebeat could be configured in such a way as to just follow Logback, and not have another set of semantics to understand.
What is the best way to configure Filebeat to recognise what would look to it like a rename (different files using the same inode) as a new file, and never attempt to harvest from an inode using a filename that no longer exists?