Logstash ingested CSV file twice

For some reason Logstash processed CSV file twice. Could you please help to find out why?

Running logstash.version 6.5.3.

input {
file {
path => "/path/to/*.csv"
start_position => "beginning"
mode => "read"
file_completed_action => "log"
file_completed_log_path => "/path/to/logs/processed.log"
}

File "/path/to/logs/processed.log" contains two records of "/path/to/input.csv" when it should be only one!
Also, there are double amount of documents in ElasticSearch with duplicates from this file.

The sincedb should prevent this. You would have to provide more information, including matching sincedb entries and entries from the file completed log. Did you restart logstash?

Yes, sincedb should've prevented this.

There are two entries of "/path/to/input.csv" in "/path/to/logs/processed.log".
And there is one entry for "/path/to/input.csv" in /var/lib/logstash/plugins/inputs/file/.sincedb_ like this:
4278179 0 64769 758 1560832674.226105 "/path/to/input.csv"

No, I haven't restarted logstash.

Well, just found out that logstash reads CSV file each time it restarts!
it recreates .sincedb file upon start and processing file from the beginning.

Is there a bug in Logstash that has been fixed in later versions? Or is it "by design"?

Now I'm thinking of a workaround - to write a bash script that moves files after certain time to a different directory (not to loose them).

It's certainly not "by design". The only reason to have a sincedb on disk is to persist the in-memory sincedb across restarts so that the file input does not re-read files when logstash is restarted.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.