I am using logstash to read S3 bucket file and insert to Elasticsearch.
I am facing 2 issues right now.
If the S3 file is empty, the logstash does not delete and move forward rather it fires error.
If the file is corrupt(half readable) then it repeat insert of the same file again and again. It does not delete the S3 file so its read again and again.
Regarding (1), what error are you getting? I can't see anything in the LS code that detects an empty file and raises an error.
Regarding (2), if you can't delete the bad S3 file in the bucket, you will have to edit the sincedb file. It stores the string representation of the last-modified time of the last file completely read.
You will need to find the last-modified time of the corrupt file in AWS and change the contents of the sincedb file.
You can find the sincedb file path at $HOME/.sincdb_[some hex characters] unless you specified the sincedb path in the config.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.