I have configured logstash to read from AWS S3 bucket and send the data to Elastic Search. There is also a lifecycle policy setup in the S3 bucket to archive objects to glaicer after reaching 1 month. When I run logstash with S3 input plugin and if it encounters a glacier object, then it crahses and no longer push the data. Is there a way to skip glaicer objects or is there a workaround for this?
How does it "crash"? Does all of Logstash crash, or is the pipeline merely restarted? Is there any helpful log output? When running with debug-level logging enabled, are there any backtraces in the logs? These would all be helpful.
If you do, please be sure to include as clear reproduction steps as possible (setting up a public bucket that has one or more files in the states described would be super helpful).
I have a S3 bucket that has objects belonging to both Glacier and standard s3 storage class. When we start logstash with your S3 input listening to this bucket, the plugin fails with the below error and then onwards no events are being sent to Elastic Search
[2018-08-09T16:02:43,887][ERROR][logstash.pipeline ] A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::S3 bucket=>"abcd", prefix=>"input/", access_key_id=>"XXXXXXXXXXXXXXXX", secret_access_key=>"XXXXXXXXXXXXXXX", region=>"us-east-1", temporary_directory=>"/home/logstash-5.4.1/tmp/logstash", id=>"0475943184b1d0293ba2409b3baf36d958-1", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_41a047bc-9ea2-4a9b-b16c-a86790633cb3", enable_metric=>true, charset=>"UTF-8">, delete=>false, interval=>60>
Error: The operation is not valid for the object's storage class
Yes. The feature to skip s3 objects that gave been archived to glacier is currently in a pull-request that has not been merged and is therefore not yet available.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.