S3 input plugin is not able to handle encrypted data

Hello All,
I have setup the "Kinesis Firehose stream" which accepts the log data from "kinesis data stream" and stores these logs as backup on S3 bucket in gz format.
For this configuration I have kept the KMS encryption on and its value is set to "aws/s3" which is default one.
Now, I am trying to parse the KMS encrypted logs which are stored on S3 in gz format by Kinesis firehose, but I am not getting the data in decrypted format. I am using following configuration to fetch the data from S3 bucket,

input{
s3 {
"access_key_id" => "my_access_key_id"
"secret_access_key" => "my_secret_access_key"
"bucket" => "my_bucket_name"
"region" => "myregion"
"interval" => 3
}
}

output{
stdout { codec => rubydebug}
}

And the output which I am getting on console is as follows,

Is there any additional setting I need in S3 input plugin to get the decrypted data?

Thanks for the help in advance.

From your description, I'm not quite sure how the encoding and encryption are layered, but if you have:

  • newline-delimited log messages
  • that have been gzipped
  • and are stored in an encrypted state in S3

then you may find the gzip_lines codec useful.

input {
  s3 {
    access_key_id => "my_access_key_id"
    secret_access_key => "my_secret_access_key"
    bucket => "my_bucket_name"
    region => "myregion"
    interval => 3
    codec => gzip_lines {
      charset => "UTF-8" # your source charset
    }
  }
}

The S3 Input acquires and decrypts the files, handing off chunks of gzipped bytes to the codec, which creates Events. By using the GZip Lines Codec, we are able to decompress the gzip-encoded data into plaintext.

Thanks for the reply. I tried setting the codec to gzip_lines but I am getting following exception,
CharsetSetting

For input data, we have wired up the cloudwatch logs loggroup to kinesis stream, and this kinesis stream is attached to firehose stream which is storing data on S3.

The sample decoded input string is as follows,

{"messageType":"DATA_MESSAGE","owner":"123456789","logGroup":"logstash_test_loggroup","logStream":"logstash_test_loggroup_logstream","subscriptionFilters":["logstash-test-stream"],"logEvents":[{"id":"34257304609312797092785065151043143816257725575611285504","timestamp":1536150666910,"message":"{"logType":"critical","loglevel":"critical","Method":"GET","Url":"/dummy","Status":200,"Length":15,"CorrelationId":"aa4e93f9-a310-46ce-aff3-54ef94c069ad","LogCreationTime":"2018-9-5 18:01:05"}"}]}

I was wrong. The gzip_lines codec is a community-provided codec that requires data to be something that responds to read (Like an IO or open File), and by default most inputs pass a string to the codec. In its current state, it doesn't appear to work with most inputs.

The S3 Input is documented to automatically decompress .gz inputs, so it should be able to decompress without a special codec. Can you run the original again with debug-level logging enabled? This can be done with either the log.level: debug setting in your logstash.yml configuration, or with the --log.level debug command-line flag.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.