S3 Input Plugin Choking on Glacier Files


#1

Up-front info:
Distro - Amazon Linux
Logstash - yum-installed
ElasticSearch - yum-installed
These are on the same box, at present.

I am currently using the s3 input plugin to ingest .gz'ed catalina log files. Recently, due to life-cycle rules on the s3 bucket in question, several of these files have been rotated to Glacier. When standing up a new test box to try out some modifications to my grok filters, I noticed that Logstash fails to ingest any data from this bucket, returning the following error in logstash.log (bucket name and keys redacted to protect the innocent)

{:timestamp=>"2015-08-26T03:35:01.253000+0000", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::S3 bucket=>"REDACTED", access_key_id=>"REDACTED", secret_access_key=>"REDACTED", debug=>false, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, region=>"us-east-1", use_ssl=>true, delete=>false, interval=>60, temporary_directory=>"/var/lib/logstash/logstash">\n Error: The operation is not valid for the object's storage class", :level=>:error}

Some digging into the bucket revealed that, when looking at the AWS web interface, several of the items in the bucket now list with Storage Class "Glacier." Unfortunately, when listing the contents of the bucket with the AWS CLI, I receive no information or tags designating the storage class of the files listed.

When I look into the temporary directory Logstash is using (/var/lib/logstash/logstash/) I see a 0kb file that matches the filename of one of the files in the bucket that is designated as Glacier.

So, here's what I think is happening:

  1. s3 plugin accesses the bucket
  2. s3 plugin lists items in bucket, receiving a list of files with no information regarding storage class
  3. s3 plugin tries a GET on one of the Glacier-class files (with no idea that it's not a regular s3 file)
  4. s3 returns a "Hey, you can't do that" message.
  5. Repeat 1-4

What I'd like to do is to tell the s3 plugin to either recognize and ignore the Glacier files, or to move on to the next file when S3 returns "Error: The operation is not valid for the object's storage class." I've tried using exclude_pattern in the configuration file, but I believe this fails due to the plugin not receiving suitable information against which to match.

Can someone point me in the right direction on how to get the s3 plugin to ignore Glacier files?


(Mark Walkom) #2

We don't have any concept of Glacier with the S3 plugin.

Plus the retrieval of Glacier is not real-time, I put some thoughts on this in this GH issue, which also applies https://github.com/elastic/elasticsearch/issues/12500


#3

This issue, I believe, is actually slightly different than the linked GH issue. In the linked issue (if I am correct in my understanding) the problem is that you can't see things that have been rotated to Glacier. The problem I am having with the S3 plugin is that we can see these files, but that the plugin does not differentiate between "Glacier" and "Standard" storage types within the bucket, and (as far as I know) cannot "move on" when it fails to pull what it thinks is an S3 file like any other.

I believe this is a recent change made on the AWS side, in which they are listing Glacier-class files along with the Standard-class files in the S3 bucket.

You are correct that Glacier does not operate in real-time, which is why the S3 plugin fails to retrieve the file.


(Mark Walkom) #4

It'd be worth raising an issue on the plugin repo then :slight_smile:


#5

Will do. Thanks for pointing me in the right direction.


(Darrian) #6

I've just hit this same issue, I had a quick search on the plugin repo in GitHub but I couldn't see any issue yet raised against this?


(system) #7