S3 Input Plugin Choking on Glacier Files

zwalls · August 26, 2015, 3:16pm

Up-front info:
Distro - Amazon Linux
Logstash - yum-installed
ElasticSearch - yum-installed
These are on the same box, at present.

I am currently using the s3 input plugin to ingest .gz'ed catalina log files. Recently, due to life-cycle rules on the s3 bucket in question, several of these files have been rotated to Glacier. When standing up a new test box to try out some modifications to my grok filters, I noticed that Logstash fails to ingest any data from this bucket, returning the following error in logstash.log (bucket name and keys redacted to protect the innocent)

{:timestamp=>"2015-08-26T03:35:01.253000+0000", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::S3 bucket=>"REDACTED", access_key_id=>"REDACTED", secret_access_key=>"REDACTED", debug=>false, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, region=>"us-east-1", use_ssl=>true, delete=>false, interval=>60, temporary_directory=>"/var/lib/logstash/logstash">\n Error: The operation is not valid for the object's storage class", :level=>:error}

Some digging into the bucket revealed that, when looking at the AWS web interface, several of the items in the bucket now list with Storage Class "Glacier." Unfortunately, when listing the contents of the bucket with the AWS CLI, I receive no information or tags designating the storage class of the files listed.

When I look into the temporary directory Logstash is using (/var/lib/logstash/logstash/) I see a 0kb file that matches the filename of one of the files in the bucket that is designated as Glacier.

So, here's what I think is happening:

s3 plugin accesses the bucket
s3 plugin lists items in bucket, receiving a list of files with no information regarding storage class
s3 plugin tries a GET on one of the Glacier-class files (with no idea that it's not a regular s3 file)
s3 returns a "Hey, you can't do that" message.
Repeat 1-4

What I'd like to do is to tell the s3 plugin to either recognize and ignore the Glacier files, or to move on to the next file when S3 returns "Error: The operation is not valid for the object's storage class." I've tried using exclude_pattern in the configuration file, but I believe this fails due to the plugin not receiving suitable information against which to match.

Can someone point me in the right direction on how to get the s3 plugin to ignore Glacier files?

warkolm · August 27, 2015, 5:30am

We don't have any concept of Glacier with the S3 plugin.

Plus the retrieval of Glacier is not real-time, I put some thoughts on this in this GH issue, which also applies https://github.com/elastic/elasticsearch/issues/12500

zwalls · August 27, 2015, 12:55pm

This issue, I believe, is actually slightly different than the linked GH issue. In the linked issue (if I am correct in my understanding) the problem is that you can't see things that have been rotated to Glacier. The problem I am having with the S3 plugin is that we can see these files, but that the plugin does not differentiate between "Glacier" and "Standard" storage types within the bucket, and (as far as I know) cannot "move on" when it fails to pull what it thinks is an S3 file like any other.

I believe this is a recent change made on the AWS side, in which they are listing Glacier-class files along with the Standard-class files in the S3 bucket.

You are correct that Glacier does not operate in real-time, which is why the S3 plugin fails to retrieve the file.

warkolm · August 27, 2015, 9:17pm

It'd be worth raising an issue on the plugin repo then

zwalls · August 28, 2015, 1:27pm

Will do. Thanks for pointing me in the right direction.

rikkuness · March 21, 2017, 2:12pm

I've just hit this same issue, I had a quick search on the plugin repo in GitHub but I couldn't see any issue yet raised against this?

Topic		Replies	Views
S3 input plugin does not recognise Glacier Flexible Retrieval Logstash	2	302	April 27, 2022
S3 input plugin failed to stream from Glaciers Logstash	4	771	September 5, 2018
Logstash S3 inout plugin crashes when reading Glacier object Logstash	5	640	September 13, 2018
S3 Input Plugin Sincedb Time Logstash	1	1121	May 22, 2019
Logstash S3 Input Plugin Error Logstash	15	2190	September 13, 2020

S3 Input Plugin Choking on Glacier Files

Related topics