Logstash ingested CSV file twice

victoravr · June 18, 2019, 11:06pm

For some reason Logstash processed CSV file twice. Could you please help to find out why?

Running logstash.version 6.5.3.

input {
file {
path => "/path/to/*.csv"
start_position => "beginning"
mode => "read"
file_completed_action => "log"
file_completed_log_path => "/path/to/logs/processed.log"
}

File "/path/to/logs/processed.log" contains two records of "/path/to/input.csv" when it should be only one!
Also, there are double amount of documents in ElasticSearch with duplicates from this file.

Badger · June 18, 2019, 11:15pm

The sincedb should prevent this. You would have to provide more information, including matching sincedb entries and entries from the file completed log. Did you restart logstash?

victoravr · June 18, 2019, 11:32pm

Yes, sincedb should've prevented this.

There are two entries of "/path/to/input.csv" in "/path/to/logs/processed.log".
And there is one entry for "/path/to/input.csv" in /var/lib/logstash/plugins/inputs/file/.sincedb_ like this:
4278179 0 64769 758 1560832674.226105 "/path/to/input.csv"

No, I haven't restarted logstash.

victoravr · June 19, 2019, 4:28am

Well, just found out that logstash reads CSV file each time it restarts!
it recreates .sincedb file upon start and processing file from the beginning.

Is there a bug in Logstash that has been fixed in later versions? Or is it "by design"?

Now I'm thinking of a workaround - to write a bash script that moves files after certain time to a different directory (not to loose them).

Badger · June 19, 2019, 12:47pm

It's certainly not "by design". The only reason to have a sincedb on disk is to persist the in-memory sincedb across restarts so that the file input does not re-read files when logstash is restarted.

system · July 17, 2019, 12:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash rereading the files even after since_db has that file entry which was already read Logstash	7	16	November 18, 2024
Logstash 8.3.3 removes entries from sincedb Logstash	1	197	August 31, 2022
How can i make logstash to read file with same filename after processing Logstash	4	1138	May 1, 2020
Logstash parsing same contents again and again Logstash	11	552	April 29, 2018
Not able to use since_db properly Logstash	2	270	July 30, 2019

Logstash ingested CSV file twice

Related topics