Up-front info:
Distro - Amazon Linux
Logstash - yum-installed
ElasticSearch - yum-installed
These are on the same box, at present.
I am currently using the s3 input plugin to ingest .gz'ed catalina log files. Recently, due to life-cycle rules on the s3 bucket in question, several of these files have been rotated to Glacier. When standing up a new test box to try out some modifications to my grok filters, I noticed that Logstash fails to ingest any data from this bucket, returning the following error in logstash.log (bucket name and keys redacted to protect the innocent)
{:timestamp=>"2015-08-26T03:35:01.253000+0000", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::S3 bucket=>"REDACTED", access_key_id=>"REDACTED", secret_access_key=>"REDACTED", debug=>false, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, region=>"us-east-1", use_ssl=>true, delete=>false, interval=>60, temporary_directory=>"/var/lib/logstash/logstash">\n Error: The operation is not valid for the object's storage class", :level=>:error}
Some digging into the bucket revealed that, when looking at the AWS web interface, several of the items in the bucket now list with Storage Class "Glacier." Unfortunately, when listing the contents of the bucket with the AWS CLI, I receive no information or tags designating the storage class of the files listed.
When I look into the temporary directory Logstash is using (/var/lib/logstash/logstash/) I see a 0kb file that matches the filename of one of the files in the bucket that is designated as Glacier.
So, here's what I think is happening:
- s3 plugin accesses the bucket
- s3 plugin lists items in bucket, receiving a list of files with no information regarding storage class
- s3 plugin tries a GET on one of the Glacier-class files (with no idea that it's not a regular s3 file)
- s3 returns a "Hey, you can't do that" message.
- Repeat 1-4
What I'd like to do is to tell the s3 plugin to either recognize and ignore the Glacier files, or to move on to the next file when S3 returns "Error: The operation is not valid for the object's storage class." I've tried using exclude_pattern in the configuration file, but I believe this fails due to the plugin not receiving suitable information against which to match.
Can someone point me in the right direction on how to get the s3 plugin to ignore Glacier files?