How does sincedb_clean_after in file{} work?

Hi. I have been reading https://www.elastic.co/guide/en/logstash/master/plugins-inputs-file.html for a while, and stumbled upon the following statement: "Sincedb records can now be expired meaning that read positions of older files will not be remembered after a certain time period." I'm aware that this points to the sincedb_clean_after option made available in March/April 2018. However, I seek to confirm that it actually works on my system. Is is expected that inode records gets deleted from sincedb, or does Logstash check if the inode entries have been expired at discovery interval?

What is the expected behavior of this option? See the reply below, as to why I'm looking into this.

Background to why I'm looking into this, is that:
I believe there still is a byte offset that is wrongly set in my test environment. Occasionally log lines are not properly read from the beginning. E.g "2018-12-24 host1 application2: did something" would be read into Logstash as "-24 host1 application2: did something".

The folder which Logstash watches is subject to the following:

  • Rsyslog receivers data and saves as logtype.YYYY-MM-DD-HH:MM.log
    • Files increase in size over time, and are generated each minute.
    • Size of files are roughly 500 MB at working hours, 100 MB at night.
  • Files get gzipped if they are older than 6 hours
    • .gz files are excluded
  • If size of folder exceeds 40% of partitions size, remove files

During my testing the files that had this behavior were still available, and I confirmed that the log line looks fine at disk. In other words, the files which were faulty read were not deleted or gzipped. File deletions are expected to occur roughly after 7 hours.

Options set in file{}:
sincedb_clean_after => 3.2 hours,
ignore_older => 3 hours.
close_inactive => 1 minute
mode => tail
exclude => .gz

Logstash version: 6.5.2
CentOS 7

I have already verified that the logs do not contain any well-hidden 0x0a/0x0d chars that could be misinterpreted as newline.

Think I may have found the issue, as I have not had parse failures after. Increasing close_inactive seem to be the fix for me. I wonder if it may be due to files being initially opened, but Logstash was unable to finish reading and closing the file handle within the expected 1 min. Thus, files were perhaps partially read?

This seems like a likely cause, as files may stop increasing in size before logstash closes the file handle. Thus no size increase and won't be detected within ignore_older.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.