S3 input plugin failed to stream from Glaciers

lcui_dxc · August 7, 2018, 7:50pm

Hello there,

We just set up ELK to stream event/logs from s3 to logstash using s3 input plugin (all version 6.3.2).
We can stream all standard files, but we have the lifycircle enabled to save all files older then 90 days to Glacier. I read online, it seems the S3 input plugin explicitly forbid the GLACIER storage class.... and I don't see any workaround.

Here is one of the items in the logstash log (many of such)...

S3 input: Unable to download remote file {:remote_key=>"AWSLogs/xxx/CloudTrail/us-east-1/xxxx/xx/xx/xxxxxx_CloudTrail_us-east-1_20171105T0430Z_bbLZTyIdIkYfp2Su.json.gz", :message=>"The operation is not valid for the object's storage class"}

Do we have any workaround on this? We can not change the the LifeCircle rules...

Thank you very much

yaauie · August 8, 2018, 12:16am

Data that is archived from S3 to Glacier cannot be read directly by any of Amazon's APIs; the resource must first be restored to S3 (which will cost you differing amounts of money depending on how quickly you need it restored).

From Amazon's Documentation:

Objects archived to Amazon Glacier are not accessible in real-time. You must first initiate a restore request and then wait until a temporary copy of the object is available for the duration (number of days) that you specify in the request. The time it takes restore jobs to complete depends on which retrieval option you specify Standard, Expedited, or Bulk.

-- Amazon S3 Docs -- Restoring Objects

The S3 Input Plugin does not currently provide support for restoring archived objects, and merely skips archived objects that it encounters. If this feature were implemented in future, significant care would need to be taken to ensure that a poorly-configured pipeline couldn't go rogue and repeatedly restore the same objects, since each restoration of an object from glacier costs real money.

lcui_dxc · August 8, 2018, 3:53am

Yaauie,

Thank you for your response... you are right. Logstash s3 plugin can not read directly from Glasier... we don't want to read data from Glacier either.
could you please let me know how to avoid or skip reading data/files on Glacier using logstash s3 input plugin? I tried to set up Storage_class => Standard, it seems not working.... we use logstash s3 input plugin v3.3.7.

Thanks a lot

yaauie · August 8, 2018, 6:29pm

I've opened up a PR on the plugin to add support for skipping glacier-archived objects: https://github.com/logstash-plugins/logstash-input-s3/pull/160

system · September 5, 2018, 6:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash S3 inout plugin crashes when reading Glacier object Logstash	5	646	September 13, 2018
S3 Input Plugin Choking on Glacier Files Logstash	5	1939	July 6, 2017
S3 input plugin does not recognise Glacier Flexible Retrieval Logstash	2	307	April 27, 2022
S3 Input not working Logstash	6	2863	August 8, 2018
Logstash s3 input Logstash	3	935	July 6, 2017

S3 input plugin failed to stream from Glaciers

Related topics