Logstash 7.3.0 File Input Plugin not Reading Recreated Files

Hi,

I have a bunch of json files that I'm using logstash to read, then delete, and then those files will eventually be recreated (with either the same contents, or slightly different contents). I'd then like for logstash to read the new/recreated files, but for some reason it discovers them and ignores them. I'm using the following input:

    input {
            file {
                    path => "/dir/intake/edi_job_status/*/*.json"
                    mode => "read"
                    file_completed_action => "delete"
                    sincedb_path => "/dev/null"
                    stat_interval => "30 seconds"
                    discover_interval => 1
                    codec => "json"
            }
    }

It's my understanding that this is supposed to read the files, delete them, then check for new files every 30 seconds. The first two steps work fine, and I'm able to see all of my expected results in Kibana. But after it reads the files once, it never does it again, even when they're recreated. Using trace-level logging I can see that it does indeed discover the new files when they're recreated:

[2019-10-28T16:16:47,098][TRACE][filewatch.discoverer     ] discover_files {"count"=>0}
[2019-10-28T16:17:17,099][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-28T16:17:17,099][TRACE][filewatch.readmode.processor] Active processing
[2019-10-28T16:17:17,107][TRACE][filewatch.discoverer     ] discover_files {"count"=>79}
[2019-10-28T16:17:39,364][TRACE][logstash.agent           ] Converge results {:success=>true, :failed_actions=>[], :successful_actions=>[]}
[2019-10-28T16:17:47,112][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-28T16:17:47,113][TRACE][filewatch.readmode.processor] Active processing
[2019-10-28T16:17:47,124][TRACE][filewatch.discoverer     ] discover_files {"count"=>186}
[2019-10-28T16:18:17,127][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-28T16:18:17,128][TRACE][filewatch.readmode.processor] Active processing
[2019-10-28T16:18:17,144][TRACE][filewatch.discoverer     ] discover_files {"count"=>201}
[2019-10-28T16:18:47,148][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-28T16:18:47,149][TRACE][filewatch.readmode.processor] Active processing
[2019-10-28T16:18:47,168][TRACE][filewatch.discoverer     ] discover_files {"count"=>201}
[2019-10-28T16:19:17,172][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-28T16:19:17,173][TRACE][filewatch.readmode.processor] Active processing

but for some reason it doesn't read them.

Any help would be appreciated.

You are probably hitting this issue. Deciding whether a re-used inode is a new file or not is an extremely hard problem. There are changes that could be done to the file input to improve it :slight_smile:

Ah yeah, my colleague actually just found that thread after I posted this question. Now I'm trying to figure out a good workaround using options like sincedb_clean_after => "30 seconds" but that doesn't seem to be helping either.

Yeah, you may be hitting this issue.

Thanks for pointing that out. Unfortunately, even using a small number like the 0.0001 example used in that thread doesn't seem to help. It looks like sincedb does get cleaned:

[2019-10-29T09:26:11,651][TRACE][filewatch.sincedbcollection] sincedb_write: to: /dev/null
[2019-10-29T09:26:11,653][TRACE][filewatch.sincedbcollection] sincedb_write: cleaned {"key"=>"'628709 0 64775'"}
[2019-10-29T09:26:11,655][TRACE][filewatch.sincedbcollection] sincedb_write: cleaned {"key"=>"'628701 0 64775'"}
[2019-10-29T09:26:11,655][TRACE][filewatch.sincedbcollection] sincedb_write: cleaned {"key"=>"'628705 0 64775'"}
[2019-10-29T09:26:11,655][TRACE][filewatch.sincedbcollection] sincedb_write: cleaned {"key"=>"'628710 0 64775'"}
...

but then after that, it just discovers the newly recreated files and doesn't read them again:

[2019-10-29T09:26:41,712][TRACE][filewatch.discoverer     ] discover_files {"count"=>0}
[2019-10-29T09:27:11,712][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-29T09:27:11,713][TRACE][filewatch.readmode.processor] Active processing
[2019-10-29T09:27:11,717][TRACE][filewatch.discoverer     ] discover_files {"count"=>48}
[2019-10-29T09:27:31,371][TRACE][logstash.agent           ] Converge results {:success=>true, :failed_actions=>[], :successful_actions=>[]}
[2019-10-29T09:27:41,718][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-29T09:27:41,718][TRACE][filewatch.readmode.processor] Active processing
[2019-10-29T09:27:41,730][TRACE][filewatch.discoverer     ] discover_files {"count"=>150}
[2019-10-29T09:28:11,731][TRACE][filewatch.readmode.processor] Watched processing
[2019-10-29T09:28:11,732][TRACE][filewatch.readmode.processor] Active processing
[2019-10-29T09:28:11,739][TRACE][filewatch.discoverer     ] discover_files {"count"=>201}

This is using:

input {
        file {
                path => "/dir/intake/edi_job_status/*/*.json"
                mode => "read"
                file_completed_action => "delete"
                sincedb_path => "/dev/null"
                stat_interval => "30 seconds"
                discover_interval => 1
                sincedb_clean_after => "0.0001 seconds"
        }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.