Deleting Filebeat Registry File


(Bill) #1

Is it safe to delete the FileBeat registry file?

I see entries in there going back from when I first started running filebeat. This has been causing inode reuse issues where Filebeat keeps thinking I am reusing files. More information here.

See here: Old file with new name found

I thought it would be good to ask in a separate thread to document the impact of deleting the registry file.

I think that once ignore_older is reached that the entry from the registry file should be removed. That, and perhaps a directive should be added to the configs to allow a prospector to ignore older inodes that might appear in the file.

Thanks.


Deleting the registry file
(Steffen Siering) #2

registry-file is used to 'restart' from last known position. Deleting the complete registry file is not 'safe', as this might affect files currently being processed.

One workaround for now is having some kind of 'garbage collector' script on registry-file (to be run when filebeat is stopped), deleting single entries. (registry-file is all JSON).

Right, once ignore_older is reached a file is not reloaded, so should be safe to delete. But with 1.2 release ignore_older has been set to 'infinite' by default. Better solution might be, having filebeat generate a fingerprint in addition to inode + remove old entries if inode+finterprint has been gone (maybe for N hours) + remove old entry if inode is found, but fingerprint changed.


(Bill) #3

Not reloaded, but what I am observing is that filebeat keeps seeing my current logs as an "update" to an older file within the registry due to inode re-use. These particular logs compress on a given time interval so the original files disappear and the inodes end up getting re-used. This has been leading to gaps in my data because filebeat will try to pick up at the offset where it believes it left off (very far into the log). for example, let's say I have a 10 minute log, the "update" offset has it so I won't see data appear in logstash until the 9th minute of the log. This is effectively a 9 minute gap in data. In other cases, the entire time interval is missing (no data for that chunk gets sent at all) and in others it seems like it restarts from offset zero. I have not been able to pinpoint when it either goes back to offset zero vs. doesn't send any logs at all; it would seem in both cases that the offset doesn't exist within the file.

Do you know of any existing garbage collection script? If not, may have to work on creating one and sharing it with the community. Thanks.


(Steffen Siering) #4

Sorry, I'm not aware of any gc script. Sounds like problem you're facing mandates a script to stop filebeat during logrotation. Very bad, as this should be handled by filebeat and not operators writing very custom scripts.

I believe fingerprinting files (computing hashes from 'few' bytes of file content) will solve the issue at hand.

I think it makes sense to track this one as bug. I didn't find a ticket on github exactly matching problems you're facing. Can you open one if possible including some $ ls output + registry file excerpts describing your problem. Will give you the chance to track any progress, too.

Thanks.


(Bill) #5

Well I cobbled together a bash script which effectively does what I need by checking whether the file in the registry file exists or not. If it exists it echo's the JSON string and pieces the JSON output back together. Pretty simple.

Please share with anyone who might be having this issue.


(ruflin) #6

This issue should potentially solve this problem: https://github.com/elastic/beats/issues/1022 Please have a look and let me know if this would help.


Registry Error Upgrading Beats, Safe to Delete Registry File?
(Bill) #7

I'm tracking that thread as my issue (1341) has been referenced in it.

I think that as long as the registry is cleaned up once the ignore_older period is reached (agree that a timestamp is needed on the registry entry for the record), this will allow the faster rotating log files to rotate without experiencing the inode resuse issue.


(system) #8