S3 input plugin is not able to handle encrypted data

Satej · September 5, 2018, 1:23pm

Hello All,
I have setup the "Kinesis Firehose stream" which accepts the log data from "kinesis data stream" and stores these logs as backup on S3 bucket in gz format.
For this configuration I have kept the KMS encryption on and its value is set to "aws/s3" which is default one.
Now, I am trying to parse the KMS encrypted logs which are stored on S3 in gz format by Kinesis firehose, but I am not getting the data in decrypted format. I am using following configuration to fetch the data from S3 bucket,

input{
s3 {
"access_key_id" => "my_access_key_id"
"secret_access_key" => "my_secret_access_key"
"bucket" => "my_bucket_name"
"region" => "myregion"
"interval" => 3
}
}

output{
stdout { codec => rubydebug}
}

And the output which I am getting on console is as follows,

Is there any additional setting I need in S3 input plugin to get the decrypted data?

Thanks for the help in advance.

yaauie · September 5, 2018, 5:51pm

From your description, I'm not quite sure how the encoding and encryption are layered, but if you have:

newline-delimited log messages
that have been gzipped
and are stored in an encrypted state in S3

then you may find the gzip_lines codec useful.

input {
  s3 {
    access_key_id => "my_access_key_id"
    secret_access_key => "my_secret_access_key"
    bucket => "my_bucket_name"
    region => "myregion"
    interval => 3
    codec => gzip_lines {
      charset => "UTF-8" # your source charset
    }
  }
}

The S3 Input acquires and decrypts the files, handing off chunks of gzipped bytes to the codec, which creates Events. By using the GZip Lines Codec, we are able to decompress the gzip-encoded data into plaintext.

Satej · September 6, 2018, 6:18am

Thanks for the reply. I tried setting the codec to gzip_lines but I am getting following exception,

For input data, we have wired up the cloudwatch logs loggroup to kinesis stream, and this kinesis stream is attached to firehose stream which is storing data on S3.

The sample decoded input string is as follows,

{"messageType":"DATA_MESSAGE","owner":"123456789","logGroup":"logstash_test_loggroup","logStream":"logstash_test_loggroup_logstream","subscriptionFilters":["logstash-test-stream"],"logEvents":[{"id":"34257304609312797092785065151043143816257725575611285504","timestamp":1536150666910,"message":"{"logType":"critical","loglevel":"critical","Method":"GET","Url":"/dummy","Status":200,"Length":15,"CorrelationId":"aa4e93f9-a310-46ce-aff3-54ef94c069ad","LogCreationTime":"2018-9-5 18:01:05"}"}]}

yaauie · September 6, 2018, 8:32pm

I was wrong. The gzip_lines codec is a community-provided codec that requires data to be something that responds to read (Like an IO or open File), and by default most inputs pass a string to the codec. In its current state, it doesn't appear to work with most inputs.

The S3 Input is documented to automatically decompress .gz inputs, so it should be able to decompress without a special codec. Can you run the original again with debug-level logging enabled? This can be done with either the log.level: debug setting in your logstash.yml configuration, or with the --log.level debug command-line flag.

system · October 4, 2018, 8:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
S3 input with cloudtrail codec not working with gzipped files Logstash	3	2155	July 6, 2017
S3 plugin not functioning correctly for GZ files from Firehose Logstash	1	359	August 9, 2019
S3 input plugin missing kms decrypt parameter Logstash	1	325	June 25, 2020
Can Logstash pull gzipped files from s3 Logstash	4	4918	February 21, 2018
Using S3 as input plugin Logstash	2	1425	July 6, 2017

S3 input plugin is not able to handle encrypted data

Related topics