From what I've seen, the S3 input last_run is a date. I didn't think you could use an S3 API to process objects stored from a certain time onward. I thought it needed the key of the last object processed as a marker. Is that not correct?
How does the S3 input plugin accurately keep track of where it left off?
The s3 input does not have a last_run option in the current incarnation (not sure about the history here). It uses sincedb_path to persist data about what it has processed.
That's what I meant: sincedb. I think that file contains a date. How is that able to work with S3? I thought S3 only works with the last key that was processed, not date.
My understanding is that the s3 input fetches every object from the bucket and compares the last_modified metadata with the sincedb. If watch_for_new_files is set it will do that over and over, which is why you might want one of the backup options and the delete option so that once an object is processed it is no longer fetched over and over again.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.