Hi. I have been reading https://www.elastic.co/guide/en/logstash/master/plugins-inputs-file.html for a while, and stumbled upon the following statement: "Sincedb records can now be expired meaning that read positions of older files will not be remembered after a certain time period." I'm aware that this points to the sincedb_clean_after option made available in March/April 2018. However, I seek to confirm that it actually works on my system. Is is expected that inode records gets deleted from sincedb, or does Logstash check if the inode entries have been expired at discovery interval?
What is the expected behavior of this option? See the reply below, as to why I'm looking into this.
Background to why I'm looking into this, is that:
I believe there still is a byte offset that is wrongly set in my test environment. Occasionally log lines are not properly read from the beginning. E.g "2018-12-24 host1 application2: did something" would be read into Logstash as "-24 host1 application2: did something".
The folder which Logstash watches is subject to the following:
Rsyslog receivers data and saves as logtype.YYYY-MM-DD-HH:MM.log
Files increase in size over time, and are generated each minute.
Size of files are roughly 500 MB at working hours, 100 MB at night.
Files get gzipped if they are older than 6 hours
.gz files are excluded
If size of folder exceeds 40% of partitions size, remove files
During my testing the files that had this behavior were still available, and I confirmed that the log line looks fine at disk. In other words, the files which were faulty read were not deleted or gzipped. File deletions are expected to occur roughly after 7 hours.
Think I may have found the issue, as I have not had parse failures after. Increasing close_inactive seem to be the fix for me. I wonder if it may be due to files being initially opened, but Logstash was unable to finish reading and closing the file handle within the expected 1 min. Thus, files were perhaps partially read?
This seems like a likely cause, as files may stop increasing in size before logstash closes the file handle. Thus no size increase and won't be detected within ignore_older.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.