Read *.json.gz from AWS S3 bucket

Hi All,

I am new to ELK Stack and trying to read data from S3 buckets. The json data is in compressed format and the folder structure in the S3 bucket is like

YYYY-MM-DD/.json.gz

2021-12-01/A.json.gz
2021-12-02/B.json.gz
2021-12-03/C.json.gz

Folder can have multiple files..

Looking for some code snippet to read these files using Logstash.

Read the file input plugin docs File input plugin | Logstash Reference [7.16] | Elastic
read mode supports gzip file processing but I believe you have to define a gzip codec then in your input.
However, try it without the codec and see if just the read works on it's own. I haven't tried that before.

input {
   file {
       path => ["/var/log/202*/*.json.gz"]
       codec => "gzip_lines"
       mode => "read"
   }
}

From the S3 input's docs:

Each line from each file generates an event. Files ending in .gz are handled as gzip’ed files.

Since the S3 input is line-oriented, if the contents of your GZIP files are not line-oriented (such as each being a JSON blob representing a single JSON object), you may need to use the multiline codec to buffer all of the lines into a single event, and then a json Filter to parse the contents into a structured object:

input {
  s3 {
    bucket => ""
    access_key_id => "1234"
    secret_access_key => "secret"
    codec => multiline {
      pattern => "." # anything
      what => "previous" # accumulate until EOF
    }
  }
}
filter {
  json {
    source => "message"
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.