Logstash file input periodically skips over files

ddiguru · November 14, 2024, 9:56pm

I am attempting to use the file input module to "watch" a directory for inbound files. Gzipped log files are shipped to this directory from various remote hosts, and the file input picks them up, decompresses or processes them "as is" and sends data to some upstream collector - graylog, elasticsearch, etc..

ubuntu 24.0.1
logstash 8.15.3

It is mostly working but every now and then it will skip a file or set of files. I am pretty sure my config is valid b/c it works MOST of the time.

input {
    file {
        path => "/var/lib/logstash/data/*gz"
        start_position => "beginning"
        mode => "read"
        file_completed_log_path => "/var/lib/logstash/consumed.log"
        file_completed_action => "log_and_delete"
    }
}

Anyone have ideas how best to troubleshoot this and/or know if i'm running into any known issues?

Thanks

Badger · November 14, 2024, 10:02pm

That could be inode re-use. A low value for sincedb_clean_after might help.

ddiguru · November 14, 2024, 10:12pm

Is that possible given i'm getting files predictably every 10m from various sources?

How "low" do you suggest?... i think the def. is 2w.

ddiguru · November 14, 2024, 10:15pm

maybe do 15 mins?

Badger · November 14, 2024, 10:27pm

Yes. If you have logstash delete files after reading them then the inodes are freed up and on some filesystems that will put them into a cache to be reused. This can make re-use quite common.

I would base the value of sincedb_clean_after on the maximum time you ever expect a file to stay in /var/lib/logstash/data/

ddiguru · November 14, 2024, 10:34pm

i wonder if i would be better of NOT log_and_delete, and allow them to hang 24h period and run a find /var/lib/logstash/data -delete -mtime +1 sort of thing?

ddiguru · November 14, 2024, 10:38pm

my original thinking is that:

servers send logs every 10m at varying times
logstash consumes them just as soon as it can
applies groks, filters, transformations, etc...
sends data to upstream graylog for persistence
next() log file...

So, i am counting on logstash to consume them as quick as possible, then log_and_delete those it was able to process.

That's where i'm at... so, i just set sincedb_clean_after to 1h. i'll see what that brings me. Interesting challenge inode reuse is...

ddiguru · November 15, 2024, 5:02pm

So, i have changed to log ONLY, i've set the sincedb_clean_after to .005 which is super low. I have 28 log files that have been shipped to the processing folder and i have 28 entries in the consumed.log file.

I will monitor this for a while, but that seems to be a better way to process logs w/o the risk of freq. inode re-use. Thanks @Badger for the tip on where to investigate.

Topic		Replies	Views
Logstash File input filter, ingest twice same file name Logstash	2	889	September 7, 2020
File input: sincedb general questions Logstash	7	3386	July 6, 2017
Logstash File Input Won't Re-read Files Logstash	4	1388	March 2, 2020
Logstash randomly skipping files Logstash	4	834	February 24, 2021
Logstash sometimes skipping files on S3 Logstash	1	566	October 29, 2018

Logstash file input periodically skips over files

Related topics