If I define an S3 bucket as input, and logstash processes a few before being turned off, how does it know where to begin so that it doesn't process the same data again? For example, if I have 2 files in my S3 bucket , and logstash has already pushed them to my specified output (elasticsearch) , it won't re-process them even if i delete the files from the output (delete the elasticsearch index).
From what I see, the sincedb file stores a date (keeps track of the date the last handled file was added to S3). Does logstash use this date to figure out what files have been processed? What if N files were pushed into s3 during the same time, and I stopped logstash when it was done processing N-1 files, would it re-process all N files again when I start logstash again?
Thank you