TL/DR: I can't seem to get the s3 input with cloudtrail codec working if the file is gzipped (which is the default for cloudtrail). It does work if I download the file, unzip it, and upload it back into a different S3 bucket.
Details:
I am usiing logstash 2.2.2.
I started out with a normal cloudtrail bucket created by AWS, and a simple config like this:
input {
s3 {
bucket => "cloudtrail-logs"
codec => cloudtrail {}
}
}
output {
stdout { codec => rubydebug }
}
When I run logstash with --debug, I see this:
S3 input: Adding to objects[] {:key=>"AWSLogs/blahblah/CloudTrail/us-east-1/2016/03/15/blahblah_CloudTrail_us-east-1_20160315T1520Z_qQm4gunsTnNuJosk.json.gz", :level=>:debug, :file=>"logstash/inputs/s3.rb", :line=>"116", :method=>"list_new_files"}
S3 input processing {:bucket=>"cloud-analytics-platform-cloudtrail-logs", :key=>"AWSLogs/blahblah/CloudTrail/us-east-1/2016/03/15/blahblah_CloudTrail_us-east-1_20160315T1520Z_qQm4gunsTnNuJosk.json.gz", :level=>:debug, :file=>"logstash/inputs/s3.rb", :line=>"150", :method=>"process_files"}
S3 input: Download remote file {:remote_key=>"AWSLogs/blahblah/CloudTrail/us-east-1/2016/03/15/blahblah_CloudTrail_us-east-1_20160315T1520Z_qQm4gunsTnNuJosk.json.gz", :local_filename=>"/var/folders/8f/1bjm5vq53c73tjq0yl4560dj1r5f6h/T/logstash/blahblah_CloudTrail_us-east-1_20160315T1520Z_qQm4gunsTnNuJosk.json.gz", :level=>:debug, :file=>"logstash/inputs/s3.rb", :line=>"344", :method=>"download_remote_file"}
Processing file {:filename=>"/var/folders/8f/1bjm5vq53c73tjq0yl4560dj1r5f6h/T/logstash/blahblah_CloudTrail_us-east-1_20160315T1520Z_qQm4gunsTnNuJosk.json.gz", :level=>:debug, :file=>"logstash/inputs/s3.rb", :line=>"182", :method=>"process_local_log"}
Pushing flush onto pipeline {:level=>:debug, :file=>"logstash/pipeline.rb", :line=>"450", :method=>"flush"}
Pushing flush onto pipeline {:level=>:debug, :file=>"logstash/pipeline.rb", :line=>"450", :method=>"flush"}
Pushing flush onto pipeline {:level=>:debug, :file=>"logstash/pipeline.rb", :line=>"450", :method=>"flush"}
And it just keeps printing that last line over and over and never does anything else. If I go look in /var/folders/8f/1bjm5vq53c73tjq0yl4560dj1r5f6h/T/logstash/ I do indeed see a gzipped file, blahblah_CloudTrail_us-east-1_20160315T1520Z_qQm4gunsTnNuJosk.json.gz .
Now, if I unzip this file, and create myself a test bucket, and put the unzipped file into the test bucket, and run logstash pointing at my test bucket, it works fine!
According to the docs at https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html , if the filename ends in .gz then the s3 input should handle it automatically.
What could be the problem? I am pulling my hair out here.