I am attempting to use the file input module to "watch" a directory for inbound files. Gzipped log files are shipped to this directory from various remote hosts, and the file input picks them up, decompresses or processes them "as is" and sends data to some upstream collector - graylog, elasticsearch, etc..
ubuntu 24.0.1
logstash 8.15.3
It is mostly working but every now and then it will skip a file or set of files. I am pretty sure my config is valid b/c it works MOST of the time.
Yes. If you have logstash delete files after reading them then the inodes are freed up and on some filesystems that will put them into a cache to be reused. This can make re-use quite common.
I would base the value of sincedb_clean_after on the maximum time you ever expect a file to stay in /var/lib/logstash/data/
i wonder if i would be better of NOT log_and_delete, and allow them to hang 24h period and run a find /var/lib/logstash/data -delete -mtime +1 sort of thing?
So, i have changed to log ONLY, i've set the sincedb_clean_after to .005 which is super low. I have 28 log files that have been shipped to the processing folder and i have 28 entries in the consumed.log file.
I will monitor this for a while, but that seems to be a better way to process logs w/o the risk of freq. inode re-use. Thanks @Badger for the tip on where to investigate.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.