Logstash S3 Input - How does it know where to start?

Shahzaib_Gill · April 10, 2018, 12:15am

If I define an S3 bucket as input, and logstash processes a few before being turned off, how does it know where to begin so that it doesn't process the same data again? For example, if I have 2 files in my S3 bucket , and logstash has already pushed them to my specified output (elasticsearch) , it won't re-process them even if i delete the files from the output (delete the elasticsearch index).

From what I see, the sincedb file stores a date (keeps track of the date the last handled file was added to S3). Does logstash use this date to figure out what files have been processed? What if N files were pushed into s3 during the same time, and I stopped logstash when it was done processing N-1 files, would it re-process all N files again when I start logstash again?

Thank you

yaauie · April 10, 2018, 12:39am

From looking at the source, I see that the plugin modifies the sincedb with a file's last modified timestamp when it is done reading the file.

It compares against this timestamp when listing the files to local memory, to determine which files are new.

Theoretically, this means that if N>1 files are uploaded to S3 with the same timestamp, once the first file is read, if the Logstash process is quit or exits before the remaining log files at identical timestamp are consumed, when Logstash starts back up, they will be skipped.

Which explains this bug filed back in October 2015

system · May 8, 2018, 12:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
S3 input plugin. Parse again an S3 bucket Logstash	3	628	September 11, 2019
Logstash sometimes skipping files on S3 Logstash	1	566	October 29, 2018
Logstash s3 input reads file multiple times Logstash	1	198	July 21, 2023
Query on Logstash S3 input Logstash	4	211	March 27, 2023
Process downloaded log files Logstash	11	1486	June 20, 2019

Logstash S3 Input - How does it know where to start?

Related topics